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FOREWORD 


The  DARPA  Smart  Sensor  Program,  under  the  leadership  of  LTC  Carlstrom, 
brought  together  a collection  of  teams  each  of  which  combined  Educational 
and  Industrial  Laboratories.  This  report  documents  the  contributions  of 
the  team  consisting  of  the  University  of  Maryland  and  Westinghouse  Defense 
and  Electronics  System  Center  and  the  US  Army  Night  Vision  and  Electro- 
Optics  Laboratories. 

The  teamwork  of  Government,  Educational  Institution,  and  Industrial 
Laboratory  was,  in  this  case,  highly  productive.  The  Initial  concept 
design  was  influenced  by  an  understanding  of  the  limitations  of  hardware 
implementation  and  the  hardware  design  considerations  validly  reflected 
the  needs  of  system  algorithm  design.  Both  reflect  well  the  constraints  of 
militarily  relevant  missions. 

The  spirit  of  cooperation  and  productivity  were  enhanced  by  monthly 
program  reviews  attended  by  all  three  principal  research  personnel  which 
allowed  many  iterations  of  the  design  process  in  a short  period  of  time. 

The  result  was  a great  decrease  in  the  delay  usually  encountered  on 
transferring  a new  idea  from  basic  research  to  implementation  in  military 
hardware . 
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1.  Introduction 


This  is  the  Final  Report  on  Contract  DAAG53-76-C-0138 , 
"Algorithms  and  Hardware  Technology  for  Image  Recognition", 
covering  research  conducted  during  the  period  1 May  1976 
through  31  January  1978.  The  Contract  was  funded  under  DARPA 
Order  3206,  and  was  monitored  by  the  U.  S.  Army  Night  Vision 
Laboratory,  Fort  Belvoir,  VA.  The  project  monitors  were 
Mr.  John  S.  Dehne  and  Dr.  George  R.  Jones  of  NVL.  The 
principal  investigators  were  Profs.  David  L.  Milgram  and 
Azriel  Rosenfeld  of  the  University  of  Maryland.  A subcontract, 
entitled  "Recognition  Technology  for  a Smart  Sensor",  was 
awarded  to  the  Westinghouse  Defense  and  Electronics  Systems 
Center;  it  was  directed  by  Dr.  Glenn  E.  Tisdale,  program 
manager,  and  Mr.  Thomas  J.  Willett.  Monthly  meeting  were  held 
in  which  key  NVL,  Maryland,  and  Westinghouse  personnel  parti- 
cipated. 

Under  the  contract,  algorithms  for  detecting  and  classify- 
ing tactical  targets  on  forward-looking  infrared  (FLIR) 
imagery  were  developed.  The  subcontract  investigated  the  im- 
plementability  of  these  algorithms  in  charge-coupled  device 
(CCD)  technology,  and  also  successfully  implemented  one  basic 
function,  sorting.  A list  of  the  principal  accomplishments 
under  the  contract  is  given  in  Section  2.  Section  3 reviews 
the  Maryland  work  on  algorithms,  including  image  modelling, 
pre-  and  post-processing,  segmentation,  feature  extraction, 
and  classification.  Section  4 reviews  the  Westinghouse  efforts. 
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A list  of  reports  and  publications  generated  under  the  con- 
tract is  given  in  Section  5. 
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2.  Summary  of  Accomplishments 


1.  Design  and  implementation  of  a comprehensive  algorithm 
for  object  recognition  in  FLIR  imagery  with  a detec- 
tion rate  above  95%  and  a false  alarm  rate  between  1 
and  2 false  alarms  per  frame. 

2.  Fabrication  and  testing  of  a CCD  sorter  chip  capable 
of  operating  at  3 megapixels/sec.  The  sorter  function 
is  a crucial  step  in  several  image  operations  includ- 
ing histogramming,  median  filtering,  non-maximum 
suppression  and  connected  component  coloring. 

3.  Investigation  of  the  cost,  performance  and  constraint 
tradeoff  in  implementing  a target  cueing  algorithm  in 
CCD  (charge-coupled  device)  technology.  The  result- 
ing design  is  within  specifications  for  usage  in  smart 
sensors. 

4.  Development  of  the  "Superslice"  algorithm  for  reliable 
region  extraction  based  on  the  cooccurrence  of  border 
points  of  regions  with  locally  maximum  edge  detector 
responses.  This  is  an  important  example  of  the  use 

of  convergent  evidence  to  strengthen  assertions. 

5.  Design  and  analysis  of  statistical  models  for  threshold 
selection,  image  operation  response  prediction,  and 
optimal  edge  detection. 

6.  A new  method  for  adaptive  quantization  of  an  image 
which  reduces  the  number  of  gray  levels  present  using 
only  the  histogram. 

7.  Comparison  of  image  smoothing  methods,  including 
median  filtering. 

8.  A study  of  shrink/expand  noise  cleaning  schemes,  in- 
cluding a local  min/max  method  which  cleans  the  image 
prior  to  thresholding. 

9 . Evaluation  of  a variety  of  edge  detectors  and  the  de- 
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velopment  of  a reliable  method  for  edge  thinning. 

10.  Construction  of  a "fuzzy"  thinning  algorithm  which 
allows  thinning  to  occur  in  gray  level  images  prior 
to  thresholding. 

11.  Development  of  methods  for  threshold  selection  based 
on  gray  level  and  gradient  value. 

12.  Generalization  of  thresholding  to  the  multiple  object 
class  environment  with  the  ability  to  predict  appro- 
priate (gray  level,  gradient  value)  segmentation  re- 
gions for  the  object  classes  present. 

13.  A variable  thresholding  scheme  which  produces  a binary 
(or  ternary)  representation  of  an  image. 

14.  An  extension  of  threshold  selection  for  sequences  of 
images . 

15.  Simplification  of  the  logic  of  the  standard  connected 
component  coloring  algorithm  and  its  extension  to 
produce  a chain  encoding  of  the  component  boundary  in 
a single  pass. 

16.  Implementation  of  Hyperslice:  a recursive  segmenta- 
tion which  improves  the  Ohlander  region  extraction 
method . 

17.  An  algorithm  for  region  tracking  in  image  sequences 
using  dynamic  programming. 

18.  Comparison  of  features  for  target  recognition. 

19.  Construction  of  a hierarchical  classifier  for  target 
detection  and  recognition. 

20.  Development  of  Viewmaster  - a software  aid  to  assist 
in  the  construction  of  image  processing  programs. 
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The  Westinghouse  effort  complemented  the  Maryland  effort  in 
several  ways.  Initially,  assistance  was  provided  to  Maryland  in 
describing  the  requirements  for  automatic  cueing,  and  in  developing 
a data  base  of  FLIR  imagery.  A brief  description  of  the  function 
of  Westinghouse  cueing  algorithms  was  offered  for  background 
information  only. 

During  the  conception  and  test  of  the  Maryland  cueing  algo- 
rithms, Westinghouse  carried  out  an  investigation  of  techniques 
for  their  implementation,  with  particular  emphasis  on  charge 
transfer  devices.  When  processing  functions  were  specified,  a 
detailed  analysis  was  then  carried  out  so  as  to  determine  the  feasi- 
bility of  implementing  them  in  CCD's.  This  process  continued  through- 
out the  first  year  of  the  program. 

During  the  final  nine  months,  a specific  circuit  was  chosen  for 
the  fabrication  of  a demonstration  unit.  A sorter  function  was 
selected  because  of  its  occurrence  in  several  cueing  operations. 

Chips  were  fabricated  and  tested  at  the  Westinghouse  Advanced 
Technology  Laboratories,  and  a demonstration  unit  was  assembled 
and  shown  at  the  Image  Understanding  Workshop  in  October,  1977. 

The  unit  rearranges  a random  series  of  pulses  in  ascending  order  by 
magnitude . 

An  estimate  was  also  made  of  the  area  in  monolithic  silicon 
required  to  implement  the  cuer  function  in  CCD's.  The  algorithm 
presently  proposed  by  Maryland  would  require  an  area  of  11-1/4 
by  1-1/7  inches.  If  3-inch  bv  3-inch  modules  were  emoloved  with 
1/2-inch  centers,  an  equivalent  volume  would  be  3 inches  by  3 
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inches  by 
missiles , 


inches.  This  volume  has  positive  implications  for 
.e.,  Smart  Missiles  and  other  airborne  platforms. 
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3. 


Algorithms 


3 . 1 Introduction 

This  section  describes  the  contribution  made  by  the 
University  of  Maryland  to  the  state  of  the  art  in  FLIR  target 
recognition  and  computer  vision.  The  subsections  describe 
the  main  areas  of  investigation  and  cover  significant 
advances  in  image  modelling  and  algorithm  development.  Not 
every  idea  investigated  on  the  project  was  equally  successful; 
this  report  reflects  the  main  lines  of  effort,  while  other 
investigations  are  documented  in  the  quarterly  and  semi- 
annual reports  [1-4] . 
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3 . 2 Image  models 


The  work  on  image  modeling  conducted  under  this 
project  was  concentrated  in  three  main  areas: 

1)  Modeling  of  the  joint  (gray  level,  edge  value) 
statistics  of  FLIR  scenes,  as  a basis  for  de- 
fining threshold  selection  techniques. 

2)  Modeling  of  thresholding  and  edge  detection 
responses  to  background  regions,  as  a basis  for 
predicting  false  alarm  rates. 

3)  Modeling  edges  in  images  as  a basis  for  defining 
optimal  edge  detection  operations  and  for  evalu- 
ating edge  detector  output. 

This  work  is  briefly  summarized  in  the  following  subsections. 
References  are  given  to  earlier  project  reports  in  which 
detailed  treatments  can  be  found. 
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3.2.1  Model-based  threshold  selection 


An  approach  to  modeling  FLIR  imagery  has  been  developed, 
based  on  the  simplifying  assumption  that  targets  appear  as 
homogeneous  hot  regions  within  a homogeneous  cooler  surround. 
This  model  describes  the  joint  probability  density  of  gray 
level  and  edge  strength  in  such  images,  for  various  edge- 
detecting operations  [1,2],  In  brief,  the  model  predicts 
that  for  low  edge  values  (corresponding  to  points  in  the 
interiors  of  objects  and  background) , there  should  be  two 
relatively  well  separated  probability  peaks,  of  different 
sizes,  representing  the  gray  levels  of  object  and  background 
interiors,  respectively.  For  higher  edge  values,  corres- 
ponding to  points  on  object/background  borders,  these  peaks 
should  merge  together  and  become  a single  peak  representing 
the  border  range  of  gray  levels. 

The  model  just  described  can  be  used  as  a guide  to 
segmenting  FLIR  images  by  thresholding.  At  low  edge  values, 
it  should  be  easy  to  pick  a threshold  at  a gray  level  in 
the  valley  between  the  two  probability  peaks,  since  these 
are  relatively  well  separated.  At  high  edge  values,  the 
peak  gray  level  value  itself,  or  perhaps  the  mean  gray  level, 
should  be  a good  threshold,  since  this  represents  the  "center" 
of  the  edges.  For  intermediate  edge  values,  one  can  com- 
promise between  these  two  thresholds  in  various  ways.  A 
comparative  study  of  threshold  selection  schemes  based  on 
this  approach  has  been  conducted  [5].  This  work  will  be 
discussed  further  in  Section  3.5. 
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3.2.2  Operator  response  prediction 
a)  Predicting  results  of  thresholding 

Thresholding  images  is  a common  process  and  much 
work  has  been  directed  towards  selecting  the  correct  value 
at  which  to  threshold  and  estimating  the  expected  error. 
Normally,  one  thresholds  only  images  which  contain  some 
signal.  Thresholding  pure  noise  is  to  be  avoided  when  possi 
ble.  Since  there  may  be  occasions  when  thresholding  noise 
is  unavoidable  (e.g.,  a poor  threshold  was  chosen),  it  is 
important  to  predict  the  expected  results.  The  expected 
number  of  above-threshold  regions  that  result  when  noise  is 
thresholded  is  useful  in  planning  for  data  structure  storage 
allocation  and  in  predicting  false  alarm  rates.  When  a bad 
threshold  "breaks  up"  an  object,  knowledge  of  the  expected 
sizes  and  shapes  of  noise  regions  can  be  used  to  help  dis- 
criminate object  fragments  from  noise.  No  methods  currently 
exist  for  predicting  the  number  of  connected  components  of 
thresholded  spatially  correlated  signal  (or  noise) . However 
it  has  been  found  possible  [6]  to  estimate  the  moments  of 
regions,  the  density  of  border  points,  and  lower  bounds  on 
the  number  of  connected  components  in  thresholded  noise 
images.  The  input  grayscale  image  is  modeled  as  a two- 
dimensional  random  process  (stationary  random  field)  char- 
acterized by  its  mean  and  power  spectrum.  Tests  with  both 
synthetic  data  (smoothed  noise)  and  actual  data  were  con- 
ducted to  compare  the  predicted  and  measured  responses. 
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The  predictions  are  worst  for  thresholds  at  or  near  the 
mode  of  the  noise  distribution,  but  in  general,  the  compari- 
son showed  reasonable  agreement  between  the  predicted  and 
measured  values. 

b)  Predicting  edge  detector  response 

Statistical  response  prediction  for  edge  operators 
can  be  used  to  determine  the  nature  of  further  processing 
of  the  response.  If  edge  detector  output  is  to  be  thinned 
or  thresholded,  the  false  alarm  and  false  dismissal  rates 
depend  on  the  statistics  of  the  operator  responses.  A study 
has  been  conducted  [7]  which  discusses  the  statistical 
properties  of  the  outputs  of  some  edge  detectors  operating 
on  a general  class  of  images. 

The  image  model  considered  is  the  same  as  in  (a) 
just  above;  this  model  is  appropriate  for  predicting  the 
response  of  edge  detectors  to  background  noise.  The  edge 
detectors  analyzed  are  the  Laplacian  and  its  absolute  value, 
and  the  absolute  difference  of  averages  over  adjacent  2x2  and 
4x4  neighborhoods.  The  response  features  which  were  measured 
are  the  mean  edge  response  at  each  point,  the  variance,  the 
auto-covariance,  and  the  cross-covariance  of  gray  level 
and  edge  value  at  a number  of  displacements.  In  addition, 
the  density  of  the  local  maxima  of  edge  values  was  computed. 
Tests  using  a set  of  synthetic  background  images  showed  good 
conformity  to  the  predicted  features. 
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3.2.3  Edge  modeling 

a)  Optimal  edge  detection 

Many  optimality  criteria  have  been  proposed  for 
edge  detection.  Among  the  most  well  known  is  that  devised 
by  Hueckel  [8,9].  It  involves  fitting  an  ideal  parameterized 
step  edge  to  the  image  data  so  as  to  minimize  the  mean 
squared  error.  A new  optimal  detector  has  been  designed 
[10]  that  simplifies  several  assumptions  associated  with 
the  Hueckel  detector  and  thereby  solves  an  easier  optimiza- 
tion problem.  Specifically,  by  assuming  that  the  local  mean 
is  zero  and  the  local  variance  is  unity,  two  Hueckel  para- 
meters can  be  eliminated.  Further  simplifications  follow  if 
the  operator  can  be  applied  at  each  point  with  the  edge 
assumed  to  pass  through  the  center  (or  not  to  exist  at  all) . 

The  resulting  formulation  can  be  tuned  to  favor  edges  with 
known  a priori  probabilities.  The  computational  effort 
involved  in  applying  the  operator  may  be  reduced  by  solving 
the  associated  cubic  equation  using  a simple  iterative  approxi- 
mation, such  as  Newton's  formula.  Testing  on  actual  data 
has  verified  that  this  approach  provides  greater  sensitivity 
to  edge  orientation  than  a previously  proposed  [11] simpli- 
fication of  the  Hueckel  operator. 

b)  Evaluation  of  edge  operators 

The  Hueckel  operator  defined  in  [9]  has  been 
found  to  incorporate  a theoretical  flaw  leading  to  eccentric 
behavior  in  textured  images.  An  operator  which  is  conceptually 
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similar  but  apparently  more  dependable  has  been  defined  [12] . 
Comparative  tests  have  been  made  of  this  and  several  other 
operators  (including  Hueckel's  [9],  its  simplification  [11], 
and  the  new  optimal  operator  defined  in  [10] , as  well  as  the 
very  simple  Sobel  operator) , to  evaluate  their  adequacy  in 
obtaining  the  magnitude,  direction,  and  reliability  of  the 
edge  response  at  some  set  of  image  points , for  both  ideal 
and  distorted  images.  The  performances  of  these  operators 
were,  in  general,  closely  related  to  their  sizes  (and  hence, 
to  their  computational  costs).  All  of  the  local  operators 
were  able  to  detect  the  directions  of  distorted  edges  on 
small  (6x6)  domains  to  an  accuracy  of  about  10°,  and  their 
magnitudes  to  within  about  10%.  On  larger  (9x9)  domains, 
angular  resolution  was  improved,  but  ramps  became  significant 
as  a source  of  spurious  responses.  The  Sobel  operator  was 
judged  to  perform  better  than  the  operator  of  [11] . The 
operator  of  [10]  was  better  able  to  reject  ramps  on  larger 
domains , but  it  is  more  expensive  to  apply  than  the  other 
local  operators.  The  regional  operators  of  [9]  and  [12] 
performed  similarly;  the  latter  was  less  affected  by  the 
presence  of  imperfections. 


3-7 


3 . 3 Preprocessing 


Preprocessing  refers  to  those  transformations  applied 
to  the  raw  image  data  for  the  purpose  of  correcting,  simplify- 
ing and  regularizing  the  imagery.  The  resulting  images  should 
therefore  be  more  amenable  to  further  processing  and  more 
alike  in  certain  properties  essential  to  subsequent  algorithms. 
Thus,  for  example,  sampling  and  windowing  reduce  the  size 
of  the  image  to  be  processed.  Histogram  transformation  and 
requantization  convert  all  image  quantization  levels  to  a 
range  which  facilitates  feature  extraction.  Smoothing  re- 
inforces regional  uniformity  and  decreases  the  effects  of 
certain  kinds  of  noise. 

Preprocessing  steps  are  best  justified  by  the  problem 
environment  itself.  A knowledge  of  the  sensor  characteristics 
and  geometry  will  suggest  various  kinds  of  radiometric  and 
geometric  corrections.  For  example,  with  FLIR  data,  the 
image  is  best  understood  as  an  array  of  thermal  measurements. 

If  these  measurements  reflect  the  ground  truth,  then  much 
more  subtle  distinctions  can  be  made;  recognizing  that  a 
particular  temperature  is  beyond  the  normal  range  for  a 
vehicle  can,  perhaps,  indicate  that  the  vehicle  is  on  fire. 
Similarly,  a range  map  converting  pixels  to  actual  size/area 
measurements  can  allow  a viewer  or  a program  to  gauge  the 
size  of  a particular  region  and  thereby  discern  its  identity 
more  reliably. 
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In  the  problem  environment  at  hand  it  was  not  possible 
to  acquire  substantial  information  concerning  the  sensor  or 
the  imaging  situation,  due  to  classification  problems.  Thus 
the  "intelligent"  corrections  of  the  previous  paragraph  were 
impossible.  However,  several  preprocessing  steps  do  make 
sense.  The  following  subsections  describe  them. 
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3.3.1  Sampling 

According  to  the  sampling  theorem,  the  spacing  of  the 
data  points  should  be  half  the  size  of  the  smallest  feature 
to  be  detected.  Thus,  to  detect  objects  of  one  meter  on  a 
side,  pixels  should  correspond  to  one  half  meter  on  a side. 

In  practice,  however,  the  presence  of  noise  demands  that 
the  data  be  redundant  to  increase  the  reliability  of  the 
extraction  process.  A finer  resolution  can  often  provide 
this  redundancy.  Naturally,  the  price  must  be  paid  in 
additional  processing  time.  This  tradeoff  is  difficult  to 
model  analytically  especially  since  many  different  features 
are  extracted  from  an  image  and  their  relative  importance  is 
difficult  to. assess. 

Two  processes  for  region  extraction  are  paramount 
for  our  work  — thresholding  for  whole  region  extraction  and 
edge  detection  for  region  border  verification.  Of  these  two, 
edge  detection  is  more  sensitive  to  noise.  The  degree  of 
sampling  allowable  for  the  image  data  set  should  therefore 
not  be  so  great  that  reliable  edge  extraction  is  compromised. 

A 2-to-l  size  reduction  (eliminating  every  other  row  and 
column)  was  found  to  be  compatible  with  reliable  edge  detection. 
In  the  unsampled  images,  the  average  edge  ramp  cross-section 
was  found  to  be  about  5 pixels  wide;  thus  a 2-to-l  reduction 
gave  about  a 3-pixel  edge  ramp  which  was  consistent  with  the 
need  to  localize  edges  fairly  accurately. 
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An  alternative  approach  attempted  to  reduce  high 
frequency  noise  by  extracting  windows  based  on  2x2  averaging 
rather  than  sampling.  Thus,  instead  of  discarding  every 
other  row  and  column,  each  pixel  in  the  sampled  image  was 
the  average  over  a (disjoint)  2x2  neighborhood  in  the 
original  image.  A smooth,  less  noisy  image  was  produced  and 
row  dropouts  were  partially  eliminated.  However,  the  images 
seemed  to  have  less  contrast.  Sampling  followed  by  smoothing 
appears  to  be  better  than  smoothing  followed  by  sampling. 

A major  emphasis  in  the  project  has  been  the  detection 
of  small  or  faint  targets.  For  this  reason,  the  sampled 
images  were  also  windowed  so  as  to  capture  the  target  regions 
and  to  further  reduce  the  computational  load.  Naturally,  one 
must  avoid  techniques  which  assume  that  each  window  contains 
exactly  one  target  in  its  central  region.  This  dilemma 
asserts  itself  in  subtle  ways.  Statistical  properties  of 
the  window,  e.g.,  histogram,  central  moments,  etc.  are  good 
predictors  of  object  presence,  threshold,  etc.  However, 
they  cannot  be  employed  in  practice  unless  window  size  is 
correctly  estimated  and  window  border  situations  are  handled. 
Our  approach  does  not  depend  on  window  size  (or  frame  size) 
and  therefore  windowing  is  an  appropriate  preprocessing  step. 
Note,  however,  that  estimates  of  the  false  alarm  rate  cannot 
be  reliably  derived  solely  from  target  windows.  For  this 
reason,  small  noise  windows  (containing  no  targets)  and 
large  windows  (consisting  mainly  of  background  clutter)  were 
also  processed. 
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3.3.2  Histogram  transformations  and  adaptive  quantization 

Sensor  output  is  related  to  actual  phenomena  according 
to  physical  laws.  If  this  correspondence  is  well  under- 
stood beforehand,  it  is  possible  to  correct  and  transform 
the  data  to  improve  subsequent  processing.  Thus  if  FLIR 
data  could  be  used  to  estimate  reliably  the  temperature  of 
objects,  then  quite  stringent  tests  could  be  made  to  enhance 
recognition  rates.  Unfortunately,  the  analytic  interpretation 
of  FLIR  data  at  long  range  is  complicated  by  many  effects 
such  as  sun-angle,  wind,  smoke,  surface  composition,  etc. 
Furthermore,  the  sensor  hardware  itself  is  subject  to  un- 
predictable electronic  noise,  disturbances  and  failures. 

Only  some  of  these  effects  can  be  alleviated  and  then  (due 
in  part  to  the  classified  nature  of  the  sensor)  only  statisti- 
cally . 

Among  the  conventional  gray  level  modifications  con- 
sidered useful  for  producing  more  manageable  imagery  are  the 
rather  simple  histogram  mapping  techniques.  Figure  3.3.1a 
illustrates  the  gray  level  histogram  of  an  unmodified  image. 

The  gray  level  range  is  defined  by  eight  bits-256  gray  levels  — 
and  can  be  seen  to  exhibit  significant  non-uniformities  of 
response.  Moreover,  from  a processing  point  of  view,  256 
gray  levels  do  not  effectively  reflect  the  true  gray  level 
range  and  contrast.  A simple  2-bit  shift  operation,  converting 
8-bit  pixels  to  6 bits,  has  the  effect  seen  in  Figure  3.3.1b 
of  smoothing  the  histogram  while  reducing  the  gray  level 
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Figure  3.3.1b.  Same  histogram  re- 
quantized to  64 
levels . 


I 
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range  to  64  gray  levels.  This  technique  if  continued  for 
further  shifts  would  ultimately  combine  significant  peaks 
corresponding  to  object/background  contrast.  However,  the 
conversion  from  8-bit  to  6 -bit  was  found  to  be  justified,  as 
it  alleviated  non-uniform  sensor  response  without  destroying 
target  discriminability . 

If  one  assumes  that  a scene  consists  of  the 
juxtaposition  of  objects  of  uniform  temperature  taken  from  a 
small  number  of  such  temperatures,  then  it  is  possible  to 
convert  the  image  into  one  with  only  a few  different  gray 
levels  present.  An  attempt  at  adaptive  requantization  is 
described  in  [13] . Briefly,  an  iterative  process 

constructs  a new  histogram  from  the  previous  version  by 
identifying  gray  level  peaks  and  having  them  gain  strength 
(Le.  , points)  from  neighboring  non-peaks  while  the  non-peak 
areas  are  thereby  depleted.  The  result  is  a mapping  from 
the  original  gray  level  domain  to  a new  sparse  set  of  gray 
levels.  The  resulting  requantized  images  (Figure  3.3.2) 
seem  not  to  have  lost  object/background  discriminability. 
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Figure  3.3.2.  Result  of  four  iterations  of  the 

peak  sharpening  process  using 
neighborhood  sizes  of  2,  3,  and 
4 for  (a-b)  , sizes  2,  4,  and  5 
for  (c)  . 
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3.3.3  Image  smoothing 

In  the  previous  section,  preprocessing  steps  were 
described  which  contributed  to  the  interpretation  of  a scene 
as  a mosaic  of  uniform  sensor  responses.  The  techniques 
considered  the  gray  level  population  only.  Proximity  was  not 
involved.  In  this  section,  we  discuss  attempts  to  smooth 
the  image  spatially  so  that  nearby  points  from  the  same 
region  will  have  more  nearly  identical  gray  levels.  There 
are  a number  of  justifications  for  spatial  image  smoothing. 
First,  by  making  the  image  spatially  more  uniform,  one 
increases  the  probability  that  points  belonging  to  the  same 
region  will  be  treated  identically.  Thus,  the  point  sets 
extracted  by  thresholding  will  appear  better  defined  with 
fewer  pinholes  and  fewer  isolated  points.  This  is  reasonable 
since  the  chosen  image  resolution  is  intended  to  cover  any 
object  with  numbers  of  pixels.  The  second  reason  for  smooth- 
ing is  to  eliminate  insignificant  local  changes  of  contrast. 
Otherwise  the  output  of  edge  detection  operations  based  on 
differencing  would  contain  many  tiny  spurious  edges  which 
tend  to  obscure  the  proper  edge  signals.  Third,  the  statisti- 
cal properties  of  a smoothed  image  are  more  representative 
of  the  true  situation.  Thus,  many  decisions  based  on  the 
statistics  of  the  smoothed  image  are  more  reliable. 

Two  methods  of  image  smoothing  were  investigated. 

In  a first  attempt,  the  mean  value  of  a fixed  neighborhood 
about  each  point  replaced  the  point's  value.  Figure  3.3.3a 
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shows  the  effect  of  replacing  each  point  of  a step  edge  by 
its  mean  value  (blurring) . As  is  evident,  blurring  smears 
edges.  Figure  3.3.4a-d  illustrates  blurring  for  several  target 
windows,  and  also  shows  the  histograms  of  these  windows  before 
and  after  blurring.  Note  that  blurring  tends  to  blend  peaks 
in  histograms,  thus  making  thresholding  more  difficult.  Also, 
small  faint  objects  tend  to  become  less  distinct. 

A second  approach  to  image  smoothing  has  the  property 
of  preserving  edges.  At  each  point  of  an  image,  the  median 
value  of  the  gray  levels  over  a kxk  neighborhood  is  computed. 
The  value  of  k depends  on  the  amount  of  local  noise  variation. 
For  the  original  images,  a 5x5  neighborhood  size  was  chosen. 
Figure  3.3.3b  illustrates  the  effects  of  median  filtering  or. 
a step  edge.  Note  that  the  median  does  not  increase  the 
ramp  width.  Thus  edges  do  not  smear.  This  is  demonstrated 

in  the  two-dimensional  case  in  Figures  3. 3. 5-3. 3. 7 for  a 

tank  image.  Median  filtering  does,  however,  round  off  sharp 

corners.  This  was  not  a serious  problem  in  this  data  base. 

Figure  3.3.4e-f  illustrates  a number  of  median  filtered 

windows  and  their  histograms.  The  general  algorithm  for 

median  computation  over  k points  is  of  order  k . However, 

better  results  may  be  obtained  when  evaluating  a running 

median,  by  making  use  of  the  high  autocorrelation  of  gray 

2 

level  in  most  images.  The  cumulative  histogram  of  the  k 
data  points  is  maintained  in  a vector  of  length  d (e.g.,  d=64). 
The  k deletions  and  k insertions  are  interleaved  in  pairs. 
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a. 


Mean  filtering 


b.  Median  filtering 


Figure  3.3.3.  Effect  of  filtering  on 

step  edges  using  a five 
point  neighborhood. 
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c.  3x3  mean  filtered 
windows . 


d.  Histograms  of  (c) . 


Image  Reference: 
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26N 
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Figure  3.3.4.  Comparison  of  mean  and  median 

filtering . 
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e.  3x3  median  filtered 
windows . 


Figure  3.3.4  (continued) 
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Figure  3.3 


a.  Original  FLIR  image  of  a tank. 

Note  the  noise  content  and  the 
presence  of  a thin  noise  line  at 
the  upper  left. 

b.  Mean  filtered  image  using  a 5x5 
square  neighborhood  at  each 
point.  The  tank  appears  blurred, 
as  does  the  border  between  the 
road  (dark)  and  the  grass  (light) . 
The  thin  noise  line  is  smeared 
into  the  background . 

c.  Median  filtered  image  using  a 5x5 
square  neighborhood.  The  tank 
contours  appear  sharper,  while 
overall  the  image  has  been 
smoothed. 
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a. 


b. 


c. 


Figure  3.3.6.  Results  of  Edge  Detection 

Each  of  the  windows  was  subjected 
to  an  edge  detection  operation  which 
detects  the  most  significant  edge 
at  each  point  over  four  orientations. 

Note  that  edges  surround  the  various 
regions  in  the  image  but  that  the 
edges  in  the  median  filtered  image 
(c)  are  sharper  and  have  more  con- 
trast than  those  in  the  mean  filtered 
image  (b) . 

a.  Edge  detection  response  for  the 
original  image. 

b.  Same  as  above  for  the  mean  filtered 
image . 

c.  Same  as  above  for  the  median  filtered 
image . 
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a. 


b. 


c. 


Figure  3.3.7.  Edge  Cross  Sections 

A single  line  of  the  edge  detection 
image  passing  through  the  tank  is 
displayed  with  height  corresponding 
to  edge  value.  Notice  that  the 
median  filtered  response  exhibits  a 
thinner,  higher  peak  corresponding  to 
a sharper,  more  contrasting  edge 
than  the  mean  filtered  response. 

a.  A single  line  of  edge  response  from 
Figure  3.3.6a. 

b.  Same  as  above  for  Figure  3.3.fb. 

c.  Same  as  above  for  Figure  3.3.bc. 
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Each  (deletion,  insertion)  pair  isolates  a region  of  the 
vector  which  must  be  modified.  The  smaller  this  region  on 
the  average,  the  less  work  to  be  done.  If  the  deletion  and 
insertion  in  a given  pair  affect  the  same  bin,  no  change  is 
necessary.  The  length  of  the  region  of  change  in  the  cumula- 
tive histogram  is  the  expected  gray  level  difference  of 
points  at  distance  k.  This  corresponds  to  a variogram  value, 
v(k).  After  updating,  the  vector  is  binary-searched  for  the 
median.  Thus  the  number  of  vector  operations  is  k*v(k), 
followed  by  log  d operations  to  binary-search  the  updated 
vector.  The  sum  k*v(k)  + log  d should  be  quite  small  for 
relatively  smooth  images. 
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3. 4 Edge  detection 

The  extraction  of  edge  features  has  proved  useful  in 
a number  of  project  areas.  Section  3.5  will  describe  thresh- 
old selection  methods  which  utilize  edge  information. 

Edges  are  used  also  in  the  critical  step  of  the  Superslice 
algorithm  (Section  3.7).  In  this  section,  we  discuss  the 
variety  of  edge  operations  investigated  and  a method  for 
thinning  edge  response  so  as  to  locate  the  apparent  edge . 
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3.4.1  Comparison  of  methods 


Methods  for  edge  detection  abound  in  the  literature 
(for  a survey,  see  (14] ) . Some  of  the  simplest  methods 
involve  convolutions  of  templates  with  an  image.  A number 
of  these  were  considered  in  the  current  work.  These  include: 

Laplacian:  |e  - (a+b+c+d+f+g+h+i)/8 | , where  the 

neighborhood  of  e is 

a b c 
deft 
g h i u 
v w 

Roberts  Gradient:  max{|a-e|,  |b-d|). 

Three-by-three:  max{ | a+b+c-g-h-i | , | a+d+g-c-f-i | } 

2x2  Difference: 

l/4*max( |d+e+g+h-f-t-i-u | , | b+c+e+f-h-i-v-w | }. 

(In  other  words,  the  value  corresponds  to  the 
maximum  of  the  differences  between  2x2  averages 
over  adjacent  pairs  of  horizontal  and  vertical 
neighborhoods.  This  scheme  extends  to  diagonals  also. 

4x4  Difference:  This  is  the  same  as  the  2x2  difference 
except  that  averages  are  taken  over  4x4  neighborhoods. 

8x8  Difference:  The  same  as  the  previous  except  that 
averages  are  taken  over  8x8  neighborhoods . 

Experiments  with  these  operators  indicated  that  the 
Laplacian  (which  is  a second  difference) , the  Roberts  Gradient 
and  the  3x3  Gradient  were  too  sensitive  to  minute  changes  in 
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gray  level.  The  differences  of  averages  operators  produced 
better  output  by  virtue  of  the  increased  amount  of  smoothing 
on  each  side  of  the  edge.  Knowledge  that  typical  edge  width 
in  the  windows  was  three  pixels  suggested  that  the  4x4 
operator  could  span  the  edge  ramp  (to  give  the  maximum 
gradient  value)  while  remaining  sensitive  enough  to  detect 
the  edges  of  small  faint  regions. 
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3.4.2  Edge  thinning 

In  the  world  of  man-made  objects,  edges  correspond 
to  the  juxtaposition  of  surfaces  or  shadows.  In  a well 
focused  image,  edges  should  appear  sharp  and  should  extend 
in  some  direction  for  some  length.  (In  the  natural  world, 
the  boundaries  of  regions  are  not  necessarily  as  sharply 
defined,  e.g.,  for  trees,  fields,  etc.)  The  output  of  the 
operators  described  in  the  previous  section,  however,  are 
generally  smeared  at  or  near  the  true  edge  location.  None- 
theless, for  certain  types  of  image  understanding  it  is 
necessary  to  localize  the  edge  so  that  it  lies  along  the 
object  boundaries.  Given  a knowledge  of  the  edge  detector, 
it  is  possible  to  design  a process  which  accepts  the  output 
of  the  operator  and  which  produces  a thinned  representation 
of  the  edge  at  the  location  of  maximum  edge  response. 

It  is  not  sufficient  to  consider  simply  those  points 
of  maximum  response,  since  this  would  force  adjacent  points 
in  the  direction  of  the  edge  to  compete.  This  problem  can 
be  alleviated  by  taking  into  account  the  computed  local 
direction  of  the  edge  and  by  placing  into  competition  only 
those  points  which  are  normal  to  the  direction  of  the  edge. 

In  practice,  a directional  mask  is  associated  with  each 
edge  point  oriented  normal  to  the  direction  of  the  maximum 
edge  response  at  that  point.  The  center  point  is  then  deleted 
(assigned  zero  response)  if  any  point  within  the  mask  has  a 
greater  response.  There  are  four  masks: 
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XXX  X X 

XX  X XX  XX 

xxOxx,  0 , 0 , 0 

X X X XX  XX 

XXX  X X 


one  associated  with  each  principal  edge  direction, 
process,  called  "non-maximum  suppression,"  operates 
taneously  on  all  edge  values  to  produce  a "thinned" 
Figure  3.4.1  illustrates  the  process. 


The 

simul- 
edge  map. 
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Originals . 


Edge  detector  output. 


Thinned  edge  map.  .Thinned  edge  map  thresholded 

to  display  only  edge  values 

> 2. 


Figure  3.4.1.  Results  of  edge  detection  and  non-maximum 

suppression  on  43  tank  windows. 
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Originals . 


Edge  detector  output. 


Thinned  edge  map. 


Thinned  edge  map  thresholded 
to  display  only  edge  values 
> 2. 


Figure  3.4.1  (continued) 
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Originals . 


Edge  detector  output. 


Thinned  edge  map. 


Thinned  edge  map  thresholded 
to  display  only  edge  values 
> 2. 


Figure  3.4.1  (continued) 
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Threshold  selection 


The  properties  of  a pixel  in  a single  sensor  image 
are  its  position  and  its  gray  level.  Our  knowledge  of  the 
imaging  environment  allows  us  to  predict  an  object's  gray 
level  more  accurately  than  its  position.  In  fact,  the  whole 
point  of  cueing  is  to  locate  a target.  Thus  one  has  little 
a priori  information  about  target  position;  however,  inasmuch 
as  gray  level  is  related  to  thermal  emission  in  FLIR  data, 
there  are  some  fairly  powerful  heuristics  available  to  aid 
in  target  recognition.  For  example,  we  may  choose  to  assume 
that  operating  vehicles  are  warmer  than  the  immediate  back- 
ground and  that  they  radiate  uniformly  over  their  surfaces . 
Naturally,  such  assumptions  are  not  always  possible.  In 
cold  weather,  metal  loses  heat  faster  than  the  ground;  at 
close  range,  fine  thermal  detail  is  visible  and  the  uniformity 
assumption  fails.  Nor  are  these  assumptions  meant  to  be 
exclusive,  e.g. , we  do  not  claim  that  every  object  region 
warmer  than  its  surround  is  a target.  The  power  of  these 
heuristics  is  to  suggest  approaches  which  capture  essential 
problem  domain  knowledge. 

In  this  project,  the  force  of  the  heuristics  of  the 
previous  paragraph  is  to  emphasize  methods  which  isolate 
distinct  gray  level  regions  from  their  surrounds.  The 
simplest  of  such  methods  is  thresholding  — the  assignment 
of  all  points  whose  gray  levels  are  greater  than  a pre- 
determined level  (the  threshold)  into  a single  class  of 
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potential  object  points.  The  filtering,  aggregation  and 
ultimate  classification  of  these  points  are  the  subjects 
of  subsequent  sections.  In  this  section,  we  discuss  our 
investigations  of  numerous  methods  for  single  and  multiple 
threshold  selection. 

There  is  a progression  in  these  methods  which  corres- 
ponds on  the  one  hand  to  the  need  for  increased  sensitivity 
in  choosing  the  "right"  threshold  and,  on  the  other  hand,  to 
the  deemphasis  of  the  committment  to  that  particular  thresh- 
old. However,  we  still  retain  the  notion  that  for  each 
object  region  in  the  image  there  is  a "best"  threshold.  In 
the  worst  case,  every  possible  threshold  yields  a target 
region  which  is  invisible  (unextractable)  at  every  other 
threshold.  One  must  therefore  be  prepared  to  threshold  at 
any  gray  level  and  within  that  thresholded  image  to  discern 
the  target  regions  and  to  ignore  the  noise  regions.  These 
last  comments  appear  to  call  for  the  selection  of  every  gray 
level  as  a value  at  which  to  threshold.  Indeed,  given 
sufficient  parallel  hardware  and  powerful  target/noise 
discrimination  criteria,  this  brute  force  approach  could  lead 
to  a reliable  and  sensitive  target  cuer.  A further  dis- 
cussion of  this  option  is  in  Section  3.7  and  some  relevant 
experiments  are  to  be  found  in  Section  3.9.  The  remainder 
of  this  section  describes  techniques  for  finding  appropriate 
thresholds  when  hardware  and  throughput  considerations  allow 
only  a few  thresholds  to  be  utilized  per  frame. 
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3.5.1  Threshold  selection  based  on  edge  values 

In  Section  3.2.1  a model  was  proposed  for  images 
consisting  of  objects  and  background,  each  with  characteristic 
gray  level  distributions.  If  the  gray  level  histogram  of 
the  image  is  markedly  bimodal,  one  may  choose  the  threshold 
at  the  valley  between  the  two  peaks  (possibly  shifted  towards 
the  smaller  peak  when  using  a maximum  likelihood  estimate) . 
However,  the  smaller  the  object,  the  less  likely  the  histogram 
is  to  exhibit  strong  bimodality.  The  background  distribution 
engulfs  the  object's  gray  level  range  and  tests  for  bi- 
modality are  inconclusive. 

One  approach  [15]  to  solving  this  problem  has  been  to 
select  from  the  original  image  a set  of  points  that  are  as 
likely  to  fall  within  the  object  as  within  the  background. 

If  one  examines  the  output  of  operators  which  respond  to 
edges,  then  high  values  should  correspond  to  points  falling 
at  or  near  object  edges.  The  mathematical  model  has  shown 
the  gray  level  distribution  to  be  unimodal  with  a peak  at 
the  mean.  Thus,  these  points  are  as  likely  to  lie  on  the 
object  as  on  the  background  and  their  mean  value  should 
correspond  to  the  desired  threshold. 

A brief  description  of  the  threshold  selection  method 
is  as  follows:  Let  e(i,j)  be  the  edge  value  computed  at  (i,j)> 
and  let  g(i,j)  be  its  gray  value.  Then  the  chosen  threshold 
is  g = AVG(g (i , j ) | e (i , j ) ^ t) , where  t is  lowest  edge  value 
considered  significant.  Computationally,  two  arrays  are 
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needed.  One  array,  TOTALQ , . . . , TOTALg ^ , accumulates  the  gray 

level  g for  each  edge  value  e between  0 and  63;  i.e., 

TOTALe  = TOTALe  + g.  The  second  array,  Nq,...,^^,  tallies 

the  number  of  points  at  each  edge  value.  The  desired  average 

gray  level  g = ( \ TOTAL . ) / ( \ N . ) . 

i*t  1 i*t  1 

Two  parameters  were  treated  in  the  experimental  work; 
the  choice  of  edge  operator  and  the  edge  significance  level 
t.  Previous  work  with  edge  operators  indicated  that  the 
4x4  difference  of  averages  operator  was  superior  to  the 
others  as  an  edge  detector  in  FLIR  scenes.  Experiments  in 
threshold  selection  showed  that  thresholds  chosen  based  on 
this  operator  were  better  overall  [1] . 

The  proper  selection  of  a value  for  t is  important 
because  this  parameter  controls  the  size  and  quality  of  the 
sample  points  of  high  edge  value  used  to  compute  the  gray 
level  threshold.  Setting  t too  high  decreases  the  statisti- 
cal reliability  of  the  sample;  while  a small  t may  admit  too 
many  noise  values . The  choice  of  t depends  on  the  expected 
amount  of  object  edge.  Obviously,  many  assumptions  are  built 
into  this  notion,  e.g.,  that  the  window  contains  only  a 
single  object  of  known  size,  shape,  contrast,  resolution,  etc. 
In  a tactical  situation,  one  could  make  estimates  of  these 
parameters  based  on  situation  data.  Based  on  estimates  of 
target  size,  t was  chosen  as  the  edge  value  corresponding  to 
the  95^  percentile.  This  estimate  was  shown  by  experiment 
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to  provide  good  thresholds  for  the  windowed  data  set.  This 
is  illustrated  in  Figure  3.5.1. 

The  sensitivity  of  the  chosen  gray  level  threshold 
to  different  choices  of  t was  tested  and  a graph  of  the 
threshold  was  plotted  as  a function  of  the  gradient  cutoff 
t;  see  Figure  3.5.2.  There  is  a tendency  for  this  graph  to 
drift  toward  the  mean  gray  value  as  t is  decreased.  The 
chosen  threshold  is  stable  for  large  objects.  For  small 
objects,  the  choice  is  quite  sensitive  to  the  bin  size. 

The  approach  above  and  several  variations  [ 5 ] can 
viewed  as  methods  of  decision  surface  selection  in  (gray 
level,  edge  value)  space.  This  space  is  visualized  as  a 
two  dimensional  histogram  with  gray  level  along  one  axis  and 
edge  value  along  the  other.  Figure  3.5.3  displays  such  a 
2-D  histogram  for  a hypothetical  object  on  a background. 
Points  at  A represent  background  while  object  points  (perhaps 
with  some  noise)  cluster  at  B.  The  bottom  part  of  the  U- 
shaped  region  contains  high-edge  value  points.  As  we  have 
pointed  out,  the  average  gray  level  of  these  points  is  a 
good  threshold.  Figure  3.5.1  illustrated  2-D  histograms  for 
several  target  windows. 

One  may  consider  a threshold  as  a vertical  decision 
surface  separating  object  from  background.  Non-vertical 
partitions  of  the  space  have  also  been  investigated  [ 2 ] 
and  were  found  to  be  capable  of  adding  more  points  to  the 
boundaries  of  object  regions  without  substantially  increasing 
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2-D  Histograms. 


Figure  3.5.1. 
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Figure  3.5.1  (continued) 
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Figure  3.5.1  (continued) 
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Figure  3.5.2  (continued) 
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Figure  3.5.3.  Ideal  2-D  histogram  of  a scene 

containing  an  object  and  back- 
ground with  noise. 
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the  amount  of  noise.  Several  other  partitioning  schemes 
which  were  considered  are  discussed  in  [5] . 
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3.5.2  Slice  range  selection 

The  methods  discussed  above  predict  a single  thresh- 
old to  be  applied  to  an  image.  For  images  known  to  contain 
a single  object  class  or  for  small  windows,  the  use  of  a 
single  threshold  is  appropriate.  However,  in  general,  no 
single  threshold  will  separate  all  objects  of  interest  from 
the  background.  It  is  therefore  necessary  to  extend  our 
threshold  selection  concept  to  allow  the  choice  of  multiple 
thresholds . 

Our  approach  is  to  produce  clusters  of  points  corres- 
ponding to  region  borders  and  to  associate  the  average  gray 
level  of  each  cluster  with  a threshold  for  the  corresponding 
region.  Edge  detectors  select  at  each  point  the  maximum 
difference  of  averages  of  adjacent  neighborhoods  over  several 
directions.  By  suppressing  non-maximum  responses  normal  to 
the  selected  direction  (i.e.,  across  the  edge),  thin  contours 
result  which  appear  to  surround  object  regions  (see  Section 
3.7.2).  A by-product  of  this  process  are  points  with  very 
low  edge  value,  including  values  which  truncate  to  zero. 

Such  points  correspond  to  the  interiors  of  homogeneous  regions. 
Figure  3.5.4  illustrates  thinned  detector  responses  with 
region  interior  maxima  included.  After  thinning,  each  re- 
maining point  is  plotted  using  edge  value  and  average  gray 
level  in  a two-dimensional  histogram.  Figure  3.5.5  shows 
examples  of  images  together  with  their  2-D  histograms  based 
on  thinned  edges. 
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a. 


b. 


Figure  3.5.4a. 

b. 


LANDSAT  window  of  Monterey,  CA. 
Thinned  edge  detector  response 
(thresholded) . 


e . 


f . 


g- 


h. 


Figure  3.5.5a. 


b. 


c. 

d. 

e . 

f . 

g- 

h. 


Disk  (gray  level  30)  within  ring  (gray 
level  40)  within  background  (gray 
level  20) . 

2-D  histogram  of  (a)  with  gray  level  as 
x-axis  (increasing  left  to  right)  and 
edge  value  as  y-axis  (stretched  — in- 
creasing from  top  to  bottom) . Interior 
of  background  is  leftmost,  topmost 
cluster. 

Window  containing  house. 

2-D  histogram  of  (c) . 

Window  containing  tank. 

2-D  (stretched)  histogram  of  (e) . 
LANDSAT  window  of  Monterey. 

2-D  histogram  (thinned  edge  vs.  average 
gray  level) . 
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Two  types  of  clusters  are  produced:  interior  clusters 


represent  the  interiors  of  regions,  edge  clusters  represent 
boundaries  between  regions.  The  size  of  a cluster  (i.e., 
the  number  of  points  in  it)  is  closely  related  to  properties 
of  the  region  it  describes.  Thus  interior  clusters  relate 
both  to  the  area  of  the  region  and  to  the  size  of  the  neigh- 
borhood over  which  the  local  operations  (edge  detection, 
non-maximum  suppression)  are  defined.  For  small  object  regions 
there  may  be  no  points  sufficiently  far  from  the  object 
boundary  to  resist  suppression.  Thus,  interior  clusters  may 
be  indistinguishable  from  noise,  or  may  be  nonexistent. 

Clusters  of  points  at  higher  edge  values  are  more 
likely  to  be  significant  (based  on  our  homogeneity  assumptions) 
The  size  of  an  edge  cluster  is  therefore  related  to  the 
perimeter  of  the  surrounded  region  in  the  image.  Since  peri- 
meter increases  (roughly,  for  digital  images)  as  the  square 
root  of  area,  the  edge  clusters  for  objects  of  moderately 
different  areas  should,  nonetheless,  be  of  comparable  size. 

A priori  estimates  of  size  are  of  use  in  discriminating  true 
edge  clusters  from  random  noise. 

Each  edge  cluster  corresponds  (ideally)  to  these 
interior  clusters  whose  locations  can  be  determined  from  the 
location  of  the  edge  cluster.  Thus  a threshold  derived  from 
the  edge  cluster  will  separate  the  interior  clusters.  How- 
ever, care  must  be  taken  not  to  split  an  interior  cluster 
at  a threshold  since  this  introduces  random  noise  regions. 
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Figure  3.5.6  illustrates  a compound  decision  surface  in  the 
2-D  histogram  of  a multi-object  image.  For  further  discussion 
see  [16]  . 
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Figure  3.5.6a. 


b. 

c. 


Adjacent  object  regions  on 
background  (same  as  Figure 
3.5.5a) . 

2-D  histogram. 

2-D  histogram  partitioned  into 
classification  regions. 
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3.5.3  Variable  thresholding 

Previous  approaches  to  thresholding  apply  the  same 
threshold  to  all  points  of  the  image.  In  [17] , Nakagawa 
adapts  the  work  of  Chow  and  Kaneko  [18]  to  interpolate  a 
best  threshold  for  each  point  of  the  image.  Briefly,  the 
image  is  divided  into  small  windows  (say  32x32)  and  a test 
of  gray  level  histogram  bimodality  is  made  for  each  window. 

A best  threshold  is  chosen  for  each  bimodal  window  and 
thresholds  are  interpolated  to  all  image  points.  A binary 
image  results  when  each  threshold  is  applied  to  its  corres- 
ponding pixel  value.  Figure  3.5.7  compares  fixed  thresholding 
and  variable  thresholding  for  several  FLIR  frames.  Nakagawa 
extended  this  method  to  allow  multiple  thresholds (for  multi- 
object adjacencies).  Figure  3.5.8  illustrates  the  results. 
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b. 


Figure  3.5.7.  Comparison  of  fixed  and  variable 

thresholding. 

a.  Two  FLIR  frames. 

b.  Two-Gaussian  approximations  to  those 
32x32  window  histograms  that  were 
judged  to  be  bimodal. 
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Figure  3.5.7  ( continued ) 


c.  Results  of  applying  the  interpolated 
point  thresholds  to  (a) . 

d.  Results  of  applying  a fixed  threshold 
to  (a) . 
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c. 


d. 


Figure  3.5.8. 


a . 

b. 


c. 


d. 


Comparison  of  fixed  and  variable 
thresholding. 

Machine  parts  image. 

Two-  and  three-Gaussian  approximations 
to  those  window  histograms  that  were 
judged  to  be  bi-  and  tri-modal. 

Three  level  pictures  obtained  after 
interpolating  the  multiple  thresholds 
determined  from  (b)  and  applied  to  (a) . 
Results  of  applying  a fixed  threshold 
to  (a)  . 
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3 . 6 Noise  cleaning  and  component  labelling 

3.6.1  Shrink/expand  and  min/max 

The  result  of  thresholding  is  a binary  valued  image. 

It  often  contains  isolated  points  and  small  noise  regions 
which  are  artifacts  of  the  thresholding  and  may  not  be 
readily  visible  in  the  original  image.  Smoothed  images 
tend  to  have  fewer  (but  larger)  noise  regions.  One  may 
delete  noise  regions  by  postprocessing  the  thresholded  image. 

The  method  consists  of  multiple  applications  of  two 
processes:  "shrink"  and  "expand."  The  purpose  of  the 

sequence  of  shrinks  is  to  shrink  objects  in  a uniform  manner 
so  that  small  or  insubstantial  objects  disappear  entirely. 

The  sequence  of  expands  is  meant  to  regrow  the  remaining 
shrunken  objects  to  their  original  size.  The  result  of  the 
shrinks/expands  is  the  elimination  of  tiny  regions  (presumed 
to  be  noise  regions) . 

Each  shrink  or  expand  requires  the  simultaneous  or 
"parallel"  application  of  a local  replacement  rule  at  every 
point  of  the  thresholded  image.  The  form  of  the  shrink  rule 
is  as  follows:  Eliminate  all  l's  adjacent  to  0's.  Zero 
values  are  unchanged.  Such  a rule  decreases  the  number  of 
l's  in  the  thresholded  image;  thus,  the  image  "shrinks." 

Only  l's  surrounded  by  l's  will  survive  a shrink.  The 
number  of  successive  shrinks  determines  the  minimum  dia- 
meter of  a surviving  region. 

The  expand  rule  is  similar  to  the  shrink  rule: 
rewrite  a 0 as  a 1 if  any  of  its  neighbors  are  l's,  but 
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leave  l's  unchanged.  Thus  points  adjacent  to  l's  become 
l's,  thereby  increasing  the  number  of  l's.  If  we  wish  to 
restore  objects  (that  were  not  eliminated)  to  about  their 
original  sizes,  t shrinks  should  be  followed  by  t expands. 

Such  a shrink/expand  sequence  produces  an  image  whose  1 ' s 
correspond  to  (a  subset  of  the)  l's  in  the  untransformed 
binary  image.  Thus,  for  example,  isolated  l's  are  eliminated, 
and  objects  joined  by  narrow  necks  of  l's  may  become  dis- 
connected. Also,  thin  protrusions  from  a region  of  l's 
will  disappear.  Figure  3.6.1  illustrates  the  shrink/expand 
algorithm  for  both  the  4 and  8 neighbor  cases  and  t = 1,2,3 
(the  numbers  of  shrinks  and  expands  used) . 

A generalization  of  the  shrink  rule  was  formulated 
to  fill  pinholes  and  conserve  small  region  shape  as  follows: 
delete  a 1 if  at  least  k of  its  neighbors  are  0's  ( O's 
remain  unchanged) . The  original  shrink  rule  corresponds  to 
k = 1.  If  k > 1,  it  takes  more  zero  evidence  to  convert 
a 1 to  0 . The  generalized  expand  is  analogously  defined: 
Rewrite  a 0 as  1 if  it  has  at  least  k l's  as  neighbors  ( l's 
remain  unchanged) . 

However,  the  generalized  expand  rule  is  not  quite  as 
generous  in  providing  new  1 values,  although  it  does  fill 
pinholes  in  sufficiently  large  regions.  Figure  3.6.2  pro- 
vides a comparison  for  t=  1,  2 and  k = 1,  2,  3.  The  shrink/ 
expand  rule  with  t = 2 and  k = 3 applied  to  each  image  point 
and  its  8-neighbors  provides  efficient  noise  cleaning  with 
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Figure  3.6.1. 

Effects  of  iterating  SHRINK/EXPANDS  (S/E's). 

a. 

Original  images  - each  column  is  a single 
image  thresholded  at  four  different  values. 

b. 

4-neighbor  rule  - one  S/E 

c . 

4-neighbor  rule  - two  S/E's 

d. 

4-neighbor  rule  - three  S/E's 

* e . 

8 -neighbor  rule  - one  S/E 

f . 

8-neighbor  rule  - two  S/E's 

g- 

8-neighbor  rule  - three  S/E's 
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Figure  3.6.1  (continued) 
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Figure  3.6.1  (continued) 
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Figure  3.6.2.  Leniency  in  SHRINK/EXPAND  definitions 

for  windows  thresholded  by  two  methods. 


a. 

4-neighbor 

rule , 

one 

S/E,  k=l ,2,3 

b. 

8-neighbor 

rule , 

one 

S/E,  k=l , 2 , 3 

c. 

4-neighbor 

rule , 

two 

S/E's,  k=l , 2 , 3 

d. 

8 -neighbor 

rule , 

two 

S/E's,  k=l , 2 , 3 
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Figure  3.6.2  ( continued ) 


3-63 


c. 


d. 


Figure  3.6.2  (continued) 
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most  noise  regions  eliminated,  pinholes  filled,  and  only  a 
modest  amount  of  target  shape  distortion. 

One  may  further  generalize  this  process  for  application 
prior  to  thresholding.  This  technique,  called  "precleaning", 
involves  a sequence  of  local  MIN/MAX  operations  applied  to 
the  gray  level  image  (analogous  to  shrink/expand  applied  to 
a binary  image) . The  resulting  precleaned  image  may  now 
be  thresholded  as  desired.  The  above  threshold  regions  are 
as  they  would  have  appeared  after  shrink/expand  processing. 
Figure  3.6.3  illustrates  the  process.  This  work  is  described 
in  [19]. 
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Key: 


8-nbr . 


4-nbr . 

Original  MIN -MAX  MIN -MAX 

MIN2 -MAX2  MIN2 -MAX2 

min3*max3  min3*max3 


(b) 


Figure  3.6.3.  Results  of  applying  repeated  local  MIN  and  re- 
peated local  MAX  to  three  FLIR  images . In  each 
part,  the  upper-left  picture  is  the  original;  the 
second  column  uses  4-neighbor  local  MINs  followed 
by  4-neighbor  local  MAXes  (1,  2,  and  3 repetitions, 
in  the  first,  second,  and  third  rows) ; and  the 
third  column  is  analogous,  using  8-neighbor  oper- 
ations (i.e.,  including  the  diagonal  neighbors). 
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3.6.2  Connected  component  extraction 

The  result  of  thresholding  is  a binary  image.  After 
noise  cleaning  operations  filter  this  image  (as  needed) , it 
still  remains  to  aggregate  points  into  identified  (labelled) 
regions.  A process  which  labels  the  individual  disjoint 
regions  in  the  binary  image,  in  a single  raster  scan,  is 
well  known  in  the  literature  [20]  . It  is  described  briefly 
here . 

A set  of  l’s  in  a binary  image  is  connected  if  any 
two  points  in  it  can  be  joined  by  a path  (sequence)  of  pair- 
wise adjacent  points  lying  in  the  set.  A maximal  connected 
set  is  called  a connected  component.  The  algorithm  to  be 
described  produces  the  (unique)  decomposition  into  connected 
components,  labels  the  individual  components,  and  constructs 
for  each  connected  component  a descriptive  feature  vector. 
Although  we  do  not  specify  the  features,  it  is  assumed  that 
they  are  all  extractable  from  a raster  scan  using  a 3x3 
processing  window.  Additional  storage  is  available  to  hold 
the  feature  values  for  the  components.  Section  3.8.2 
describes  the  features. 

When  a new  region  is  encountered  during  a raster  scan, 
it  is  assigned  a vector  of  registers  to  store  its  feature 
values.  As  the  region  is  being  tracked  on  the  same  row  or 
continued  on  the  next  ro w,  values  continue  to  be  accumulated 
into  its  feature  vector.  In  order  to  specify  the  corres- 
pondence between  a region  and  its  register  vector,  a label 
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is  created  and  assigned  to  each  point  of  the  region  which 
has  already  been  visited.  The  label  will  identify  the 
appropriate  register  vector,  usually  by  some  indexing  scheme. 
Region  points  found  to  be  adjacent  to  already  labelled  region 
points  inherit  that  label  and  contribute  their  feature  values 
to  its  register  vector. 

Often  a region  encountered  for  what  is  thought  to  be 
the  first  time  may  on  a later  row  prove  to  be  connected  to 
a previously  encountered  region.  Such  regions  are  called 
subcomponents . Inasmuch  as  feature  values  were  being  main- 
tained separately  for  each  subcomponent,  it  becomes  necessary 
to  combine  the  feature  values  (eventually)  and  to  create 
a flag  that  signifies  that  the  two  subcomponents  belong  to 
the  same  component.  These  flags  reside  in  the  label  equi- 
valence table.  This  table  can  be  stored  either  as  a bit 
matrix  or  as  a list. 

Since  region  labels  propagate  from  point  to  point, 
we  must  also  keep  the  labels  of  those  points  in  the  preceding 
row  that  are  neighbors  of  unexamined  points  in  the  current 
row,  with  the  labels  of  those  examined  points  in  the  current 
row.  The  amount  of  storage  necessary  for  labels  of 

points  is  thus  only  a single  row. 

The  label  assigned  to  a component  should  designate 
whether  the  component  is  above  or  below  threshold.  If  the 
background  is  not  partitioned  into  regions  (i.e.,  is  ignored) 
by  the  algorithm  then  the  data  structure  becomes  simply  a 
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list  of  above-threshold  regions.  This  is  suitable  for  many 
applications,  e.g.,  infrared  target  cueing.  In  general, 
though,  the  containment  relation  defines  a tree  structure. 

It  is  evident  that  if  two  components  of  a binary  image  are 
adjacent  then  one  encloses  the  other.  However,  if  more  than 
one  object-background  transition  has  been  detected,  one 
cannot  know  which  encloses  which  from  strictly  local  informa- 
tion at  the  time  of  initial  label  assignment.  The  determining 
condition  is  "which  region  terminates  first?"  The  region 
terminating  first  is  enclosed  by  the  adjacent  region.  Thus 
whenever  a region  terminates,  the  data  structure  is  updated 
to  reflect  the  containment  relation.  When  a region  is 
initiated  it  is  entered  onto  a "active"  list  — the  list  of 
unterminated  regions.  At  the  end  of  each  row,  the  active 
list  is  compared  with  the  list  of  component  labels  of  the 
current  label  row.  Any  active  component  whose  label  does  not 
appear  in  the  current  label  row  is  known  to  have  terminated. 
Additionally,  when  overlapping  regions  are  combined,  the 
discarded  label  is  deleted  from  the  active  list. 

It  is  possible  to  modify  the  above  to  create  a des- 
cription of  each  connected  component's  boundaries.  Such  a 
description  is  called  a "chain  encoding"  and  is  discussed 
in  [20]  . 
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3.6.3  Fuzzy  thinning 

Objects  which  are  everywhere  elongated  are  often 
thinned  dov/n  to  a "medial  line"  for  the  purpose  of  extracting 
thickness-invariant  topological  features  of  the  objects.  The 
basic  strategy  for  thinning  is  to  iteratively  delete  border 
points  (but  not  end  points)  of  an  object  which  do  not  locally 
disconnect  it.  For  binary  images,  various  parallel  algorithms 
exist.  The  recent  extension  of  the  topological  concept  of 
connectedness  to  fuzzy  subsets  allows  us  to  generalize  thinning 
to  gray  level  images  [21] . Given  thin  dark  objects  on  a 

light  background,  we  define  gray  level  thinning  to  be  the 
successive  replacement  of  points  by  the  minimum  gray  level 
of  their  neighbors  if  those  changes  do  not  affect  the  local 
fuzzy  connectedness  for  any  pair  of  neighbors.  The  result 
of  applying  such  an  algorithm  is  a set  of  high  gray  level 
"curves"  lying  on  the  ridges  and  peaks  of  high  gray  level 
in  the  original  picture.  If  the  original  picture  is  noisy 
there  will  be  many  local  peaks;  so  while  thinning  is  defined 
for  unsegmented  pictures,  a local  threshold  is  necessary  to 
overlook  these  small  noise  peaks.  Unlike  binary  thinning, 
however,  we  no  longer  need  to  distinguish  between  border  and 
interior  points  since  thinning  a homogeneous  region  will  not 
significantly  change  the  gray  level  of  any  point;  only  a 
slight  smoothing  results.  The  results  of  experiments  with 
this  technique  are  described  in  a technical  report  [22]  . 

See  Figure  3.6.4  for  examples  of  this  process. 
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Figure  3.6.4a.  Iterations  0-3  of  fuzzy  thinning 

on  LANDSAT  window  of  Monterey, 
b.  Iterations  0-5  of  fuzzy  thinning 
on  the  output  of  an  edge  detect- 
or. 
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3.7  Superslice 

The  object  extraction  task  is  somewhat  simpler  for 
FLIR  imagery  than  for  visible-light  imagery  since  the  objects 
of  interest  (military  vehicles)  are  generally  compact  regions 
of  (more  or  less)  uniform  thermal  intensity.  For  this 
reason,  thresholding  has  been  chosen  as  an  appropriate 
method  of  segmenting  the  scene.  However,  one  can  criticize 
threshold  selection  schemes  on  a number  of  grounds.  First 
of  all,  if  a window  contains  no  object  then  thresholding  it 
is  dangerous,  since  above-threshold  noise  regions  may  often 
produce  probable  looking  "objects."  Secondly,  if  more  than 
one  object  is  present  in  the  window  then  a single  threshold 
will  not  suffice.  Thirdly,  if  an  object  overlaps  several 
windows  then  there  may  be  no  consistent  representation  of 
an  object  (i.e.,  no  representation  using  a single  threshold). 
Attempts  to  divide  the  scene  up  into  overlapping  windows , 
so  that  objects  of  maximal  size  are  guaranteed  to  lie  com- 
pletely within  a single  window,  answer  this  last  objection 
at  the  cost  of  greatly  increased  overhead.  In  any  case,  the 
size  of  the  smallest  thresholdable  region  — as  well  as  the 
particular  threshold  chosen  — depends  on  the  window  size, 
the  coarseness  of  the  grid,  and  the  type  of  statistical  test 
used  to  determine  if  a region  is  thresholdable.  One  would 
prefer,  however,  to  be  able  to  extract  a small  region  regard- 
less of  the  clutter  and  noise  beyond  its  borders. 
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Another  objection  to  pure  thresholding  is  the  presence 
of  noise  regions  in  addition  to  object  regions.  Noise  regions 
may  be  difficult  to  distinguish  when  based  on  size,  shape 
or  gray  level  features.  The  broader  and  higher  the  valleys 
of  the  gray  level  histogram,  the  more  likely  that  the  noise 
regions  will  be  extensive  and  numerous. 

A final  objection  concerns  the  design  of  optimal 
thresholding  techniques  in  which  the  optimality  is  based  on 
a statistical  model  of  the  gray  level  population.  In  situa- 
tions where  an  object  contrasts  strongly  with  the  background, 
there  may  be  a number  of  thresholds  at  which  the  object  appears 
well  defined.  As  the  threshold  decreases  through  this  accept- 
able range,  each  object  exemplar  is  contained  within  a 
slightly  larger  one.  Thus  although  the  exemplars  may  each 
look  reasonable,  the  optimality  criterion  for  the  thresholding 
does  not  necessarily  choose  a "best”  exemplar.  This  is  be- 
cause the  optimality  condition  was  based  on  the  whole  window 
rather  than  on  the  component  corresponding  to  the  object. 

For  these  reasons,  a segmentation  method  which  does 
not  require  a commitment  to  a single  threshold  in  arbitrarily 
chosen  regions  of  an  image  is  preferable.  Our  method  uses 
thresholding  as  a means  of  discovering  candidate  object 
regions.  Candidates  are  then  accepted  or  rejected  based  on 
the  coincidence  of  an  edge  map  with  the  region  boundary. 

The  surviving  object  regions  are  compared  with  the  survivors 
of  other  thresholds,  and  those  that  best  match  the 
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edge  map  are  used  to  describe  the  actual  objects  in  the 
image.  Thus,  while  a number  of  thresholds  are  used,  only 
the  one  defining  the  greatest  coincidence  of  thresholded 
region  border  and  (thinned)  edge  is  deemed  valid  for  a 
particular  region.  This  method  can  be  considered  as  defining 
a best  exemplar  for  each  object  region. 
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3.7.1  Algorithm 


The  algorithm  consists  of  several  steps  as  follows: 
median  filtering;  extraction  of  an  edge  mask  by  edge  detection 
and  thinning;  thresholding;  forming  connected  components;  and 
object  validity  checking.  For  a given  picture,  smoothing  and 
edge  map  extraction  need  to  done  only  once;  whereas  thres- 
holding and  the  subsequent  steps  are  to  be  performed  over 
a range  of  threshold  sufficient  to  extract  any  objects  in 
the  picture. 

Figure  3.7.1  illustrates  the  basic  concepts  involved. 
Figure  3.7.1a  shows  several  object  windows  along  with  a number 
of  possible  thresholds  for  each.  Note  that  it  is  not  at  all 
obvious  which  threshold  is  best.  However,  when  the  edge  map 
(Figure  3.7.1b)  is  overlaid  on  the  thresholded  picture 
(Figure  3.7.1c),  we  have  much  better  guidance.  Figure  3. 7. Id 
shows  the  object  region  extracted  from  each  window  using  the 
method  to  be  described. 

A number  of  steps  of  the  Superslice  algorithm  have 
been  discussed  in  previous  sections:  smoothing  (Section  3.3.3), 
edge  detection  and  thinning  (Section  3.4.2),  threshold 
selection  (Section  3.5.1)  and  connected  component  extraction 
(Section  3.6.2).  However,  several  problems  associated  with 
threshold  selection  deserve  mention: 

a)  The  omission  of  a threshold  from  consideration 
increases  the  probability  of  missing  extractable 
regions . 
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b. 


Figure  3.7.1a.  Four  target  windows  (large  tank, 

small  tank,  truck,  APC)  thresholded 
at  seven  different  gray  levels, 
b.  Edge  maps  (thresholded  for  visibility) . 
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d. 


Figure  3.7.1c.  Edge  maps  from  (b)  overlaid  on 

(a)  . 

d.  Object  regions  extracted  by  the 
Superslice  algorithm. 
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b)  The  greater  the  number  of  thresholds  considered, 
the  greater  the  false  alarm  rate. 

c)  The  speed  of  the  algorithm  is  approximately  linear 
in  the  number  of  thresholds  used. 

The  probability  of  missing  a object  region  due  to  the  omission 
of  a single  threshold  is  the  product  of  the  probability  that 
the  scene  contains  an  object  region  and  the  probability  that 
the  object  region  is  discernible  (by  the  algorithm)  at 
exactly  the  omitted  threshold.  Although  knowledge  of  the  a 
priori  probability  is  dependent  on  a model  for  the  scene 
(which  does  not  at  present  exist) , experiments  have  demon- 
strated that  an  object  region  which  is  discernible  at  all 
by  the  algorithm  can  be  extracted  over  a range  of  thresholds  — 
dependent,  of  course,  on  the  steepness  and  homogeneity  of  the 
edge  region  bordering  the  object.  Noise  regions,  on  the  other 
hand,  do  not  tend  to  persist  over  a range  of  gray  level  thresh- 
olds. This  tradeoff  may  therefore  be  posed  as  follows:  By 
sampling  at  every  kth  gray  level,  we  reduce  the  workload  to 
a fraction  (1/k)  without  appreciably  increasing  the  false 
dismissal  rate;  however,  we  lose  some  redundancy  in  the 
extracted  data  which  would  help  us  discriminate  object  regions 
from  false  alarms. 

The  false  alarm  rate  is  a function  of  input  window  size, 
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as  well  as  a function  of  the  number  of  thresholds  and  the 
positions  of  the  thresholds  in  the  overall  gray  level  his- 
togram. Certain  thresholds  are  worse  than  others  in  producing 
false  alarms  — specifically,  those  at  or  adjacent  to  peaks 
in  the  histogram. 

After  thresholding  and  connected  component  extraction, 
each  component  must  be  validated  as  to  whether  the  extracted 
region  really  corresponds  to  an  object  in  the  scene.  If  one 
considers  validity  checking  to  be  a classification  process, 
then  one  can  compute  a large  number  of  potential  features 
and,  using  standard  techniques,  determine  a discriminant 
function.  We  have  established  three  heuristics  to  be  of 
value.  One  is  that  objects  should  be  "well-defined,"  i.e., 
have  discernible  borders.  Note  that  not  all  real-world 
regions  satisfy  this  constraint.  For  example,  in  LANDSAT 
scenes,  forests,  urban  areas  and  clouds  can  blend  into 
their  surrounds  with  no  discernible  edge.  The  second  heuristic 
is  that  an  object's  interior  should  "contrast"  with  its 
surround.  In  this  study,  contrast  is  based  on  gray  level 
difference.  However,  other  local  features  including  texture 
measures  are  worth  considering  as  defining  object  interior. 

The  third  is  that  the  region  size  lie  within  an  acceptable 
range.  The  size  test  is  applied  first,  eliminating  any 
region  with  fewer  than  20  or  more  than  1,000  points. 

"Well  definedness"  of  a region  is  measured  by  the 
percentage  of  border  points  which  correspond  spatially  to 
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(match)  actual  edge  points  in  the  edge  map.  "Contrast" 
is  measured  by  the  absolute  difference  of  average  gray 
level  between  the  border  region  of  the  component  and  its 
interior.  Figure  3.7.2  shows  a scatter  plot  of  these  two 
features  for  the  regions  extracted  from  a set  of  windows. 

A reasonable  discriminant  based  on  these  two  features 
appears  to  be:  match  > .5  and  contrast  > .6  — i.e.,  at 
least  50%  of  the  border  matches  the  edge  map,  and  the  con- 
trast is  at  least  .6  gray  levels  (out  of  64).  Note  that 
neither  feature  is  by  itself  reliable  enough  to  discriminate 
noise  regions  from  object  regions.  Optimal  discriminants 
may  be  computed  based  on  several  models.  Regardless  of  the 
particular  model  chosen,  the  discriminant  value  can  be  inter- 
preted as  a "score"  for  the  component.  Components  with  very 
low  scores  are  discarded  as  pure  noise.  In  practice,  we  have 
used  the  match  measure  as  a score  for  objects  which  were 
above  the  pure  noise  threshold. 

The  score  is  important  in  comparing  (nested)  object 

regions  corresponding  to  the  same  object.  When  an  object 

is  thresholdable  at  gray  levels  t^  > t2  >...>  t.^,  this  gives 

rise  to  k connected  components,  C £-  C.  £...£■  C . Since 

tl  t2  k 

each  C represents  the  same  object,  we  call  each  an  "exemplar." 
ti 

In  general,  we  wish  to  select  a single  exemplar  as  the  best 
representative  of  an  object.  The  score  provides  a criterion 
for  selecting  among  exemplars.  Thus,  one  could  choose  the 
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Figure  3.7.2.  Scatter  diagram  plotting 
well-definedness  against 
contrast  for  a set  of  noise 
regions  (plotted  as  periods) 
and  object  regions  (plotted 
as  hash  marks) . 
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exemplar  C with  the  highest  score.  It  is  not  always  easy, 

j 

however,  to  determine  the  nested  sequence  (C  }.  In  parti- 

ti 

cular,  if  one  object  thresholdable  at  gray  level  t is  contained 
within  another  thresholdable  at  gray  level  t'  < t,  then 
regardless  of  the  comparative  difference  between  the  two 
scores,  we  would  want  to  retain  and  Ct,.  This  situation 
can  be  handled  by  assuming  that  nested  components  whose  areas 
are  sufficiently  different  (say,  50%  change  in  size)  corres- 
pond to  different  (although  nested)  objects.  In  thermal 
images,  this  might  correspond  to  a warm  vehicle  with  a hot 
engine  compartment,  or  to  a vehicle  on  an  asphalt  road. 

The  results  of  applying  the  algorithm  to  a set  of  16  APC 
windows  are  illustrated  in  Figure  3.7.3.  Note  that  in  almost 
all  cases  (the  negative  image  was  not  processed) , the  re- 
sulting labelled  images  contain  the  target  regions  (as  well 
as  other  regions) . 

In  summary,  the  algorithm  for  region  extraction  con- 
sists of  the  following  steps: 

1.  Smooth  the  image,  if  necessary  (to  promote  clean 

thresholding) . 

2.  Extract  a thinned  edge  picture. 

3.  Determine  a gray  level  range  for  thresholding. 

4.  For  each  gray  level  in  the  range: 

a.  Threshold  the  smoothed  image. 

b.  Label  all  connected  regions  of  above-threshold 
points . 
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c 

Figure  3.7.3a.  Sixteen  APC  windows. 

b.  Edge  maps  (thresholded  for  visibility) . 

c.  Object  regions  extracted  by  the 
Superslice  algorithm. 
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c.  For  each  connected  region: 

i.  Compute  the  percentage  of  border  points 
which  coincide  with  significant  thinned 
edge  points. 

ii.  Compute  the  contrast  of  the  region  with  the 
background . 

iii.  Classify  the  region  as  object/non-object 

based  on  the  size,  edge  match  and  contrast. 

5.  Construct  the  canonical  tree  for  the  set  of  object 
regions  based  on  containment. 

6.  Prune  the  containment  tree  by  eliminating  adjacent 
nodes  which  are  too  similar. 
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3.7.2  Conformity  - a measure  of  region  definedness 

The  Superslice  algorithm  relies  on  the  heuristic 
that  thresholded  object  regions  are  distinct  from  background 
because  they  contrast  with  their  surround  at  a well-defined 
border.  The  coincidence  of  high  contrast  and  high  edge  value 
at  the  border  of  a thresholded  region  is  an  example  of  the 
use  of  convergent  evidence  supporting  the  assertion  of  the 
object  region.  The  definedness  of  the  border  may  be  evaluated 
as  the  percentage  of  the  border  points  which  coincided  with 
the  location  of  thinned  edge  (locally  maximum  edge  response) . 
Thus  a match  score  of  50%  means  that  half  the  border  points 
are  accounted  for  as  being  on  the  edge.  However,  it  does 
not  mean  that  the  matched  points  adequately  represent  the 
object.  Figure  3.7.4  illustrates  two  cases  of  50%  match. 
(Matched  points  are  indicated  by  thick  strokes.)  Clearly, 
the  second  case  is  a better  representation  than  the  first. 

The  traversal  of  the  border  of  a thresholded  region 
induces  an  ordering  on  the  matched  points.  Let  r^,...,r 
be  the  runs  of  matched  points  encountered  during  a border 
traversal.  By  connecting  the  proximal  ends  of  runs  along 
the  traversal,  one  creates  a polygonal  approximation  to  the 
thresholded  region.  We  define  "conformity"  as  the  measure 
of  match  of  the  polygonal  approximation  to  the  thresholded 
region.  High  conformity  means  that  the  region  is  well- 
represented  by  its  approximation  regardless  of  the  actual 
percentage  of  matched  border  points.  Figure  3.7.4a  illustrates 
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low  conformity;  while  Figure  3.7.4b  shows  good  conformity. 

Conformity  is  evaluated  as  the  ratio  of  the  absolute 
difference  in  area  (between  the  two  polygonal  representations) 
to  the  area  of  the  threshold  region.  Experiments  have  in- 
dicated its  utility  as  a feature  for  discriminating  noise  from 
objects.  A quantitative  study  of  its  discrimination  value 
is  described  in  Section  3. 9. 4. 2. 


Figure  3.7.4a.  Contour  whose  matched  edge  points 

(thickened  strokes)  exhibit  poor 
conformity . 

b.  Contour  showing  good  conformity. 
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3.7.3  Hyperslice  - An  algorithm  for  recursive  region 


extraction 

The  algorithm  (Kyperslice)  described  here  is  an  amalgam 
embodying  the  recursive  control  structure  of  Ohlander  [3] 
and  the  object  extraction  techniques  of  Superslice.  Hyper- 
slice consists  of  the  following  steps  [24] : 

1.  Preprocessing  - image  smoothing,  thinned  edge 
map  extraction. 

2.  Initialize  the  extracted  region  mask  (ERM)  to  the 
empty  mask.  Initialize  the  available  points  mask 
(APM)  to  the  entire  image . 

3.  Compute  histograms  for  all  feature  images  based 
on  the  APM. 

4.  Determine  a "best"  slice  range  over  all  current 
histograms  and  slice  the  corresponding  image. 

5.  Generate  submasks  for  regions  satisfying  the  Super- 
slice criteria.  Add  them  to  the  ERM;  delete  them 
from  the  APM. 

6.  Apply  algorithm  steps  3-5  recursively  to  the  back- 
ground set  (APM) . The  algorithm  should  also  be 
applied  recursively  to  each  submask  added  to  the 
ERM,  since  the  extracted  region  may  be  a union 

of  regions  discriminable  by  some  other  feature. 

Several  comments  are  in  order.  First,  the  slice 
ranges  chosen  for  Hyperslice  should  be  rather  liberal  (i.e., 
extending  beyond  valley  bottoms  in  the  histogram) , since 
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points  not  corresponding  to  well-defined  regions  will  be 
returned  to  the  APM.  The  resulting  histograms  appear  more 
natural  (not  "carved-out" ) for  this  reason.  Secondly,  the 
resulting  decomposition  is  order-dependent,  i.e.,  different 
results  may  be  obtained  if  the  order  of  selection  of  slice 
ranges  is  changed.  If  two  adjacent  regions  in  the  image 
contribute  adjacent  peaks  in  the  histogram,  then  points  in 
the  intersection  of  the  overlapping  slice  ranges  will  gener- 
ally belong  to  the  shared  edge  region.  Whichever  region  is 
sliced  first  will  tend  to  accrete  more  of  these  points. 

Since  these  points  lie  at  or  near  the  true  edge,  they  tend 
to  increase  the  edge  match  criterion  for  that  region.  Once 
they  are  removed  from  the  APM,  they  are  not  available  to  the 
adjacent  region.  Consequently,  the  edge  match  criterion  of 
the  adjacent  region  may  suffer.  This  is  most  likely  to  occur 
for  adjacent  regions  which  lack  a strong  common  border.  The 
2-dimensional  histogram  approach  in  [16]  can  detect  adjacency 
along  weak  borders.  In  practice,  the  edge  match  criterion 
is  relaxed  somewhat  from  demanding  actual  coincidence  to 
allowing  proximity  (e.g.,  a region  border  point  adjacent  to 
a thinned  edge  point  is  counted  as  a match) . 

The  algorithm  has  been  implemented  as  an  interactive 
system  of  programs.  Several  examples  illustrate  its  ability 
to  segment  images  based  on  gray  level  alone  (i.e.,  no  other 
features  were  used  to  aid  the  segmentation).  Figure  3.7.5 
depicts  a window  of  an  ERTS  frame  of  the  Monterey  area  in 
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California.  The  water  area  contrasts  sharply  with  the  land 
and  very  little  noise  is  extracted  and  subsequently  returned 
to  the  API-1.  The  subsequent  slices  extract  light  and  dark 
fields  which  contrast  with  the  undifferentiated  background 
region. 

The  second  example  is  derived  from  Ohlander's  house 
scene.  The  average  of  the  three  color  bands  provides  the 
gray-scale.  The  resulting  image  has  been  smoothed  by  3x3 
median  filtering.  The  first  slice  range  extracts  the  sky 
regions  and  the  bright  crown  of  a bush.  Next  the  shadow 
regions  appear  along  with  the  bushes.  The  somewhat  darker 
grass  is  extracted  in  the  third  slice  range.  Finally,  the 
brick  is  extracted.  Figure  3.7.6  illustrates  this  sequence. 

Images  such  as  the  Monterey  and  house  images  are 
difficult  to  analyze  since  regions  need  not  be  well  defined 
due  to  the  complexity  of  light  reflections  and  shadows. 
Nonetheless,  this  algorithm  provides  a mechanism  for  retrieving 
those  regions  which  are  well-defined. 
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c. 


a. 


b . 


e . 


Figure  3.7.5. 


a. 

b . 

c. 


d. 


e. 


Recursive  region  extraction  on 
Monterey  image . 

LANDSAT  window. 

Edge  map. 

Histogram  of  (a) , with  selected  slice 
range  indicated. 

Mask  of  slice  range.  Within  range 
points  are  white. 

Extracted  regions  mask. 
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Figure  3.7.5  (continued) 


f . 


g- 

h. 

i . 

j* 

k. 

l. 
m. 


Histogram  of  remaining  points  after 
deleting  extracted  regions  of  (e) . 
Slice  range  mask. 

Extracted  regions  mask. 

Histogram  of  remaining  points. 

Slice  range  mask. 

Extracted  regions  mask. 

Histogram  of  remaining  points. 

Mask  of  remaining  points. 
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g. 


h. 


Figure  3.7.6. 


a. 

b. 


c,f ,i,l,o. 


d, g, j ,ra. 

e, h,k,n. 

P- 


Recursive  region  extraction  on  house 
image . 

House  window. 

Edge  map. 

Histograms  after  successive  deletion 
of  extracted  regions.  New  slice  ranges 
are  indicated. 

Slice  range  masks. 

Extracted  region  masks. 

Mask  of  remaining  points. 
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o. 


Figure  3. 


P- 


7.6  (continued) 
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3.8 


Feature  extraction 


3.8.1  Feature  design 

In  this  section,  as  in  most  work  dealing  with  pattern 
classification, a "feature"  is  taken  to  be  some  numerical 
quantity  which  can  be  calculated  for  each  object 
to  be  classified.  ("Shape"  is  not  a feature,  since  many 
features,  such  as  height/width,  measure  characteristics  of 
the  shape.)  To  be  consistent  with  a high  processing  rate 
throughout,  all  features  used  in  this  study  are  based  on 
accumulatable  quantities.  That  is,  a number  of  crude 
features  have  been  chosen  (listed  in  Table  3.8.1a  ) which  are 
defined  at  each  pixel.  The  value  of  any  of  these  features 
for  a region  is  just  the  sum  of  the  values  over  all  the  pixels 
of  the  region.  These  crude  features  can  be  accumulated  as 
the  image  is  being  segmented,  and  are  therefore  immediately 
available  for  any  region  as  soon  as  it  has  been  completely 
extracted.  The  descriptive  features  actually  used  are  simple 
functions  of  these  accumulatable  quantities , so  that  once 
any  region  has  been  extracted,  brief  calculations  produce 
all  the  information  required  for  classification  of  that 
region,  with  no  further  reference  to  the  original  image. 

One  additional  feature,  "conformity,"  has  been  obtained  for 
many  of  the  images.  This  feature  requires  rather  more  post- 
processing after  region  extraction,  and  is  included  as  a 
nearly  optimum  measure  of  one  region  characteristic  which 
should  be  of  importance  in  target  detection:  cooccurrence 
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a.  Accumulatable  features  per  connected  component 


1. 

2-3. 


Symbol 

N 

SX , SY 


4-6.  SX2,SY2,SXY 


7. 

8. 

9. 

10. 

11. 


P 

E 

SPE 

SIG 

SPG 


Meaning 

Area 

EX,IY  - first  moments 

EX2,EY2,EXY  - second  moments 

Perimeter  point  count 

High  edge  point  count 

Total  edge  value  on  the  perimeter 

Total  interior  gray  value 

Total  perimeter  gray  value 
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. SG,SG2 

Total  gray  level 
gray  level 

b. 

Intermediate 

quantities 

r — 

x 

4*  1 

/sx2 

1. 

AVE 

2. 

Y 

AVE 

4*-j 

f. 

3. 

R2 

sx2 

2 

+ SY 

4. 

V 

sg2/n  - (SG)2/N2 

Table  3.8.1.  Features. 
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c.  Recognition  features 


1. 

h/w 

yave//xave 

2. 

(h/w) ' 

Ixave--8*yaveI/VWyave 

3. 

(h*w) /A 

XAVE*  YAVE^ 

4 . 

(h+w)  /P 

(XAVE+YAVE_4)//P 

►shape 

5. 

diff 

2 2 2 
(SX  -SY  )/R 

6. 

skewness 

| SXY | /R2 

7. 

asymmetry 

( (sxy)2-sx2sy2)/r4  j 

8. 

SDEV 

/v 

9. 

Gray  level 
difference 

SIG/ (N-P)  - 

SPG/P 

► brightness 

10. 

E & P 

(Number  of  perimeter  points 
at  hign  edge  local  maxima)/? 

11. 

% 

SPE/P 

> 

d.  Special  features 

1.  conformity  (See  Section  3.7.*.) 
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of  the  region  perimeter  and  points  of  high  brightness  gradient. 
This  gives  a useful  standard  for  measuring  the  adequacy  of  the 
rapidly  calculated  feature  (E&P,  in  Table  3.8.1c  ) which  is 
used  as  a measure  of  the  same  property. 

A decision  rule  is  effectively  a mapping  from  the 
feature  space  to  a lower-dimensional  space  (the  decision 
space)  in  which  each  point  is  associated  with  a fixed  class. 
While  this  structure  is  very  general,  commonly  used  decision 
rules  are  very  severe  specializations  of  this  general  scheme. 
Usually  the  initial  mapping  is  produced  by  a set  of  poly- 
nomial functions  on  the  features,  one  function  for  each 
dimension  of  the  decision  space.  Within  this  space,  the 
class  regions  are  usually  separated  by  planar  boundaries. 

Thus,  the  Fisher  method  utilizes  a single  linear  mapping 
onto  the  line,  which  is  bisected  by  a point  (at  the  Fisher 
"threshold")  to  establish  the  two  class  domains.  Speciali- 
zation of  decision  rules  places  sharp  restrictions  on  what 
constitutes  an  appropriate  feature. 

To  discriminate  tanks  from  trucks,  a naive  observer 
might  point  out  that  one  need  only  examine  the  shapes.  One 
more  familiar  with  computational  measures  would  recognize 
that  the  shape  of  an  object  involves  a great  many  features, 
but  might  suggest  that  the  height-to-width  ratio  would  be 
one  useful  feature.  However,  height-to-width,  width-to- 
height,  log (height-to-width) , etc.  are  all  quite  distinct 
features,  one  of  which  may  be  highly  effective  in  the  desired 
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decision  while  others  may  be  totally  useless.  Useful  features 
must  thus  satisfy  a number  of  conditions,  some  of  which  are 
general,  the  others  being  imposed  when  particular  simple 
decision  rules  are  to  be  applied.  The  present  classification 
study  has  considered  linear  and  quadratic  classifiers,  a 
decision  space  with  no  more  dimensions  that  the  number  of 
classes,  and  simple  boundaries  for  each  class  within  the 
decision  space.  Several  levels  of  restriction  on  the  features 
to  be  used  with  such  a classifier  can  be  stated: 

1.  Each  feature  must  exhibit  a different  distribution 
for  each  of  at  least  two  classes. 

2.  The  classes  should  tend  to  fall  in  different  value 
ranges  for  each  feature,  since  class  assignments 
in  the  decision  space  will  be  to  connected  regions. 

3.  When  the  classifier  utilizes  sample  means  and 
variances  to  estimate  parameters  for  the  mapping 
(as  those  used  here  do) , the  true  feature  distri- 
butions of  each  class  should  be  unimodal , approxi- 
mately symmetric  about  the  mode,  and  with  a 
minority  of  points  contained  in  the  wings  of  the 
distribution. 

4.  For  use  with  linear  classifiers,  each  feature 
should  have  a distinctly  different  mean  for  at 
least  two  classes.  For  use  with  quadratic 
classifiers,  it  is  only  necessary  that  some  range 
of  values  tend  to  characterize  one  class,  while 
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the  other  class  predominates  on  the  complement. 

Despite  these  "rules"  for  good  features,  it  should 
be  noted  that  for  a multi-feature  decision  scheme,  none  of 
these  rules  is  essential.  However,  only  when  some  of  the 
features  are  very  strongly  correlated  can  the  above  princi- 
ples be  violated  without  destroying  the  classification,  and 
while  this  situation  is  not  necessarily  to  be  avoided,  it 
makes  interpretation  of  decision  rules  much  more  difficult. 
Moreover,  as  a practical  matter,  features  which  fail  to 
have  the  above  properties  normally  turn  out  to  be  ineffective 
(or  worse,  countereffective)  when  employed  in  automatic 
classification.  Since  one  is  not  really  restricted  in  the 
particular  form  of  the  features  to  be  used  (but  only  in  the 
underlying  characteristic  being  represented)  one  may  as  well 
assure  that  the  features  being  considered  are,  as  far  as 
possible,  individually  effective  means  of  class  discrimination. 

Finally,  one  more  restriction  should  be  stated. 

5.  The  features  should  not  reflect  characteristics 
which  effectively  delineate  the  sample  classes, 
rather  than  the  true  classes. 

This,  of  course,  is  the  familiar  failing  of  "small" 
samples,  but  may  appear  even  in  apparently  large  enough  samples. 
In  our  data  base  (Section  3.9.1),  several  such  "extraneous 
differentiations"  did  arise.  In  cases  where  a large  number 
of  features  are  employed  in  a classifier,  there  must  always 
be  doubt  about  whether  condition  5 will  hold.  It  is  this 
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condition,  more  than  any  other,  which  restrains  the  number 
of  features  which  can  usefully  be  included  in  a classifier. 

If  an  arbitrarily  large  number  of  features  are  measured  for 
a particular  set  of  classified  samples,  it  is  virtually 
certain  that  spurious  characteristics  will  allow  them  to  be 
well  separated  by  a decision  function  based  on  those  features, 
but  there  is  no  reason  to  expect  anything  other  than  random 
classification  of  new  samples.  The  problem  is  sufficiently 
pervasive  that  a simple  means  of  dealing  with  it  could 
almost  be  elevated  to  a principle: 

5'.  Features  should  be  included  in  a classifier  only 

if  they  identify  true  differences  between  the  classes 
more  than  they  do  spurious  differences  between 
the  samples . 

While  the  above  rule  may  seem  obvious,  it  is  important 
to  realize  that  including  additional  features  that  do  not 
discriminate  between  classes  makes  the  classifier  worse , as 
the  features  may  very  well  distinguish  the  class  samples, 
even  though  they  do  not  distinguish  the  classes.  (Self- 
classification of  the  training  set  improves,  while  classifi- 
cation of  independent  test  sets  degrades.)  Class  differences 
must  be  effectively  reflected  in  the  feature  to  make  it  safe 
to  use.  "Heigh t-to-width"  ratio  is  a dangerous  feature  to 
include  in  a linear  classifier  for  target  vs  non-target  since 
its  mean  values  for  target  and  non-target  classes  may  not  be 
greatly  different  (though  the  distributions  may  differ  greatly) , 
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so  that  small  spurious  differences  in  sample  means  may 
produce  most  of  the  "strength”  of  the  feature.  In  a quadratic 
classifier,  however,  the  problem  would  be  much  less  severe, 
since  the  discrimination  provided  by  the  feature  more  nearly 
matches  the  requirements  of  the  decision  function  employed. 
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3.8.2  Computation 


The  principal  attributes  of  image  regions  v/hich  can 
be  used  to  identify  them  are  shape  and  relative  brightness. 
Corresponding  locally  accumulatable  properties  are  pixel 
coordinates,  and  functions  of  them,  and  gray  level,  and 
functions  of  it.  Additional  information  can  be  obtained 
from  the  contrast  between  the  region  and  its  surround  at 
the  region  boundary.  One  can  know  as  one  examines  each  image 
point  whether  it  is  in  the  interior  of  a region,  on  the 
region  boundary,  or  in  the  background.  Statistics  of  inter- 
est can  therefore  be  accumulated  separately  for  these  classes. 
Finally,  the  pre-computed  edge  value  (gray-level  gradient) 
is  associated  with  each  point,  and  these  values  may  be 
accumulated  or  may  be  used  to  index  subsets  of  points  (e.g., 
"high  edge"  points)  for  which  other  quantities  may  be 
accumulated  separately.  The  accumulated  features  actually 
used  are  all  of  one  or  the  other  of  the  above  types,  and 
were  listed  in  Table  3.8.1a. 

The  features  calculated  for  use  in  classification 
studies  are  given  as  Table  3.8.1c-d.  They  are  further  divided 
into  two  groups  — those  that  are  purely  shape  measures,  and 
those  that  depend  in  some  way  on  the  brightness  of  the  region 
(or  some  part  of  it) . Many  of  the  functions  appear  to  be 
straightforward  measures  of  significant  characteristics,  but 
others  seem  less  straightforward.  The  criteria  for  choosing 
the  specific  functional  forms  used  are  discussed  in  Section 
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3.9.4.  A discussion  of  the  relative  utility  of  the  features 
appears  in  that  same  section. 
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3 . 9 Region  Classification  and  Experimental  Results 


3.9.1  Data  base  description 

For  a description  of  the  complete  "NVL"  data  base 
and  its  ground  truth  see  [1] . From  it  a set  of  174  128x128 
windows  were  selected,  extracted,  requantized,  median  filtered 
and  sampled  2 to  1.  The  set  consists  of  164  target  windows 
(75  tanks,  34  trucks,  55  APC's)  and  10  non-target  (noise) 
windows.  Figure  3.9.1  displays  this  set  of  windows  and 
their  identifiers. 
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Figure  3.9.1.  NVL  data  base  consisting  of  164 

target  windows  and  10  non-target 
windows . 

a.  75  tanks. 
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Figure  3.9.1  (continued) 
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Figure  3.9.1  (continued ) 
b.  34  trucks. 
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Figure  3.9.1  (continued) 
c.  55  APC's. 
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115A  122A  123A  125A 


c.  APC's  (continued). 


2N 

8N 

14N 

20N 

26N 

32N 

38N 

44N 

50N 

56N 

d.  10  non- target  windows. 


Figure  3.9.1  (continued) 
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3.9.2  Overview  of  classification 

There  are  two  general  approaches  to  classification 
of  objects  into  a preassigned  set  of  mutually  exclusive 
categories.  The  first  might  be  called  "semantic"  classifi- 
cation. Each  category  is  examined  for  particular  character- 
istics which  distinguish  its  members  from  those  of  every 
other  category  being  considered.  These  characteristics  are 
used  to  identify  each  object  submitted  for  classification. 
(Difficulties,  of  course,  occur  if  an  object  has  none  of  the 
"key"  characteristics,  or  has  "key"  characteristics  suggest- 
ing more  than  one  classification.  Such  an  occurrence  in- 
dicates that  the  classes  suggested  simply  do  not  include 
everything  within  the  domain  of  interest,  or  are  not  truly 
mutually  exclusive  — at  least  as  defined  by  the  set  of  "key" 
features.)  This  is  a form  of  classification  which  is  ubiquitous 
in  human  experience.  Unfortunately,  in  many  cases  of  practi- 
cal importance,  the  objects  to  be  classified  cannot  be 
characterized  by  properties  which  will  always  be  observed 
within  one  class,  and  never  in  any  other  class.  If  the 
classes  really  are  well-defined,  this  difficulty  may  arise 
because  of  the  need  to  classify  using  noisy  or  poorly  re- 
solved data.  It  may  also  occur  because  characteristics 
quite  plain  to  human  observers  may  defy  expression  as  cal- 
culatable  quantities  (one  vehicle  may  be  "sleek  and  speedy 
looking",  another  "squat  and  out-of-date").  For  whatever 
reason,  when  such  incompletely  characterized  problems  arise  , 
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a method  is  required  which  provides  a computable  "best  guess" 
classification.  All  such  methods  accept  a number  of  (usually 
numerical)  features  which  are  assumed  to  be  relevant  to  the 
classification  intended.  The  distribution  of  these  features 
for  a large  number  of  objects  whose  identity  is  already  known 
is  then  used  to  provide  a rule  which  assigns  a class  to  an 
object  given  the  n-tuple  of  features  measured  for  that  object. 
Typical  rules  of  this  sort  are  simple  polynomials  over  the 
features,  whose  values  are  used  to  determine  the  class 
assignments . 

"Statistical"  classification  finds  the  best  rule  for 
a fixed  class  under  some  (usually  very  restrictive)  assump- 
tions about  the  way  the  features  ought  to  be  distributed. 

Since  the  data  available  in  this  study  appear  not  to  provide 
enough  resolution  to  produce  a semantic  classification,  we 
have  utilized  a procedure  which  includes  a statistical 
classification  component.  A completely  statistical  classifier 
was  not  used,  however.  The  full  procedure  consists  of  a 
semantic  pre-classification  of  regions  which  could  not 
represent  targets,  followed  by  a statistical  classification 
of  the  "reasonable"  regions.  This  approach  was  chosen  pri- 
marily to  ensure  greater  robustness  in  the  resulting  classi- 
fication scheme,  as  will  be  discussed  more  fully  below. 

Finally,  it  is  important  to  analyze  the  types  of 
errors  made  by  a classifier.  For  example,  a well-behaved 
classifier  should  be  wrong  more  often  on  distorted  images 
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than  on  undistorted  ones.  This  type  of  performance  may  be 
tested 'Ey  "training  a classifier  of  the  same  type  on  a 
"training  set"  of  half  the  samples,  distributed  evenly 
through  the  classes.  The  resultant  classifier  can  then  be 
used  to  reclassify  the  whole  data  set.  If  the  "training" 
and  "test"  results  are  similar,  then  the  classifier  is 
judged  fairly  stable.  If  the  results  are  good,  then  the 
classifier  can  be  considered  fairly  powerful. 

It  is  important  to  distinguish  between  human  interaction 
in  classifier  design  and  human  interaction  in  the  operation 
of  the  classifier.  The  former  is  permissible  since  the 
classifier  is  fixed  once  it  has  been  effectively  designed 
and  trained.  No  further  human  assistance  is  allowed  and 
the  classifier  is  applied  in  an  automatic  fashion  to  the 
test  set. 
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3.9.3  Detailed  classification  description 


The  objects  to  be  classified  in  this  study  are  con- 
nected regions  of  an  input  picture,  extracted  by  thresholding 
the  image.  More  than  one  threshold  may  have  been  used  on 
any  given  picture,  so  the  regions  need  not  be  disjoint;  rather 
one  may  be  entirely  contained  in  another.  For  each  region, 
a feature  vector  containing  information  about  shape  and 
brightness  (as  described  in  Section  3.8)  is  used  as  the  sole 
source  of  information  about  the  region  for  classification. 

The  extraction  procedure  has  somewhat  preselected  these 
regions,  so  that  every  region  examined  has  at  least  minimal 
(20%)  correspondence  between  its  perimeter  and  the  high- 
edge  points,  has  at  least  minimal  contrast  (.2  gray  level), 
and  is  of  roughly  appropriate  size  (between  20  and  1000 
pixels) . 
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3. 9. 3.1  Stage  1:  pre-classification 

If  the  classification  is  thought  of  as  a two- 
stage  process  (shown  schematically  as  Figure  3.9.2),  the 
first  stage  is  a crude  "semantic"  classifier  which  identifies 
some  regions  as  having  properties  which  indicate  that  they 
are  not  targets.  Thus,  all  targets  have  similar  height  and 
width,  seen  at  any  aspect  angle.  Any  region  with  h/w 
greater  than  3 or  less  than  1/3,  then,  may  be  confidently 
rejected  from  further  consideration.  Similarly,  targets 
"should"  show  some  minimal  contrast  at  their  perimeters,  a 
good  edge-perimeter  overlap,  and  small  targets  should  be  of 
nearly  uniform  brightness.  All  these  criteria  are  set  by 
establishing  numerical  thresholds  such  that  at  least  95% 
of  the  sample  targets  satisfy  the  criteria. 

This  is  called  "semantic"  classification, 
rather  than  a very  crude  statistical  classification,  because 
the  particular  criteria  used  have  been  chosen  to  distinguish 
the  targets  on  the  basis  of  physical  characteristics  of 
true  target  images.  A statistical  classifier,  even  if  it 
arrived  at  the  same  scheme,  would  be  assessing  discriminatory 
ability  on  the  sample  of  classified  regions  provided  for 
training,  and  could  reflect  any  peculiarities  which  happened 
to  distinguish  the  categories  in  that  sample.  (In  the  NVL 
data,  APC's  often  exhibit  an  asymmetry  which  is  due  to  the 
fact  that  most  of  those  in  the  sample  appear  in  only  a 
single  aspect.  An  apparently  good  statistical  classifier 
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Image 


Small  Tanks  Trucks  APC's 
targets 


Figure  3.9.2a.  The  classification  process. 
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Candidate 
target  regions 


Figure  3.9.2b.  Stage  1 - the  pre-classifier 

(for  feature  list,  see 
Table  3.8.1)  . 
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V 

Trucks  APC 1 s 


Figure  3.9.2c.  Stage  2 - the  classifier 

(for  feature  list,  see 
Table  3.8.1) . 
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could  be  formed  which  would  unhesitatingly  identify  any  APC 
in  some  other  aspect  as  a tank.) 

This  pre-classification  examines  individual 
features  to  determine  whether  they  could  be  reasonably 
associated  with  true  targets,  and  discards  "ridiculous" 
cases.  A side-effect  of  this  sorting  is  to  assure  that 
feature  values  seen  by  the  subsequent  statistical  classifier 
are  never  very  far  from  their  characteristic  values.  This 
makes  the  classifier  much  better-behaved  than  one  which 
accepts  non-normally  distributed  features  (as  most  do)  that 
have  not  been  "critiqued." 
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3. 9. 3. 2 Stage  2:  statistical  classification 

Once  the  set  of  extracted  regions  has  been 
reduced  to  a set  of  bright,  compact,  reasonably  uniform 
regions,  statistical  classification  is  used  to  assign  a class 
to  each  particular  combination  of  features  (or  rather,  to 
its  associated  region) . A great  many  kinds  of  statistical 
decision  rules  exist.  Access  to  the  MIPACS  [25]  interactive 
system  allowed  us  to  design  a decision  tree  (each  node  of 
which  is  a standard  classifier)  for  efficient  classification. 
The  system  allows  individual  decision  functions  to  be  either 
linear  (e.g.,  Fisher),  quadratic,  or  maximum 
likelihood,  and  provided  a convenient  mechanism  for  selecting 
which  decisions  to  make,  and  just  which  features  to  use  at 
each  decision  point. 

The  basic  structure  selected  was  shown  in 
Figure  3.9.2c.  The  first  node  actually  represents  a non- 
statistical  selection.  Because  of  the  wide  range  of  apparent 
sizes  of  the  target  images  (from  25  to  1000  pixels)  and  the 
consequent  wide  range  in  visible  complexity  of  detail,  it 
was  quickly  determined  that  statistical  classifiers  would 
not  provide  good  discrimination  over  the  entire  size  range. 
(Almost  every  feature  measured  showed  substantial  correlation 
with  apparent  size,  and  since  the  various  sample  classes 
happened  to  have  rather  different  image  size  distributions, 
our  earliest  classifiers  used  that  factor  as  a main  classi- 
fication indicator.)  Therefore,  the  first  step  in  the 
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classification  is  a simple  split  on  image  area  — with  all  re- 
gions of  less  than  95  pixels  going  to  the  "small"  subtree, 
and  the  remainder  passing  into  the  "large"  subtrees.  For 
several  reasons,  principally  a presumed  lesser  urgency  for 
detailed  identification  of  small  or  distant  objects  and  the 
fact  that  in  the  smallest  images  no  significant  differences 
between  the  various  target  classes  are  apparent,  the  small 
regions  are  simply  sent  to  a node  which  classifies  them  as 
(small)  "target"  or  "non-target"  — the  specific  type  of 
target  is  left  unspecified.  For  the  large  regions,  a two- 
stage  process  followed.  As  neither  APC's  nor  trucks  are 
particularly  well  characterized  by  the  features  used  and 
their  distributions  are  very  similar,  they  were  merged  into 
a composite  "truck-like"  class.  Any  region  found  to  be  in 
this  class  is  then  assigned  as  APC  or  truck  by  a Fisher 
discriminant.  (A  major  reason  for  this  breakdown  is  that 
it  permits  fairly  large  samples  to  be  used  at  an  important 
decision  point  and  relegates  use  of  the  sparsely  sampled 
truck  class  to  a relatively  inconsequential  discrimination.) 
The  principal  decision  was  therefore  between  the  "tank"  and 
"truck-like"  classes  and  the  "non-target"  class.  Two 
different  approaches  were  tried  for  making  this  decision, 
both  based  on  a quadratic  maximum-likelihood  discriminant. 
These  are  described  more  fully  in  Section  3.9.4.  One 
approach  ("fixed  classes")  applied  the  maximum  likelihood 
criteria  directly  to  the  tank,  truck-like,  and  non-target 
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classes.  The  second  approach  included  two  "reject"  possi- 
bilities as  well  — non-target,  and  unclassified  target. 
(Notice  that  the  non-target  label  is  applied  either  if  a 
region  looks  sufficiently  like  a "typical"  non-target  or 
if  the  best  label  implies  too  unlikely  a value  for  the 
features  measured.)  The  latter  approach  was  included  to 
further  minimize  reliance  on  characterizing  non-targets 
in  detail. 

Given  the  tree  structure  for  the  classification, 
the  kind  of  classifier  and  the  set  of  features  at  each  node 
were  determined.  The  number  of  features  which  can  reliably 
be  used  depends  on  the  size  of  the  sample  set  used  for 
training.  Assuming  that  the  features  are  chosen  so  as  to 
avoid  apparent  vagaries  in  the  set  of  exemplars,  one  can 
confidently  use  an  additional  feature  for  each  ten  samples 
in  the  smallest  group,  and  sometimes  may  use  up  to  one-third 
the  sample  number  (for  a linear  classifier) . As  quadratic 
classifiers  utilize  more  detail  of  the  presumed  distribution 
one  is  restricted  to  the  conservative  end  of  that  range. 

These  rules  of  thumb,  while  not  universally  valid,  are  non- 
theless  useful  guides. 

By  merging  the  truck  and  APC  classes,  we  allow 
comfortable  use  of  a quadratic  classifier  on  five  or  six 
features  at  the  main  decision  node,  while  the  smaller  samples 
make  a linear  classifier  or  a three  or  four  feature  quadratic 
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classifier  more  reasonable  at  the  lower  node.  The  "small" 


node  could  utilize  five  or  six  features  — but  one  is  hard 
pressed  to  find  even  that  many  which  provide  any  discrimin 
atory  power  at  all.  (However,  one  feature,  E&P,  is  very 
powerful  indeed.) 
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3.9.4  Experimental  results 


3. 9. 4.1  Feature  selection 

As  in  any  classification  problem,  much  of  the 
initial  feature  selection  for  the  vehicle  recognition  task 
was  carried  out  informally.  This  phase  is  largely  intro- 
spective, determining  characteristics  of  the  images  that 
seem  helpful  for  human  judgement,  then  identifying  some 
features  that  should  suitably  reflect  these  characteristics. 
This  initial  feature  set  (conveying  "shape"  and  "relative 
brightness")  is  listed  in  Table  3.8.1,  Section  3.8.  All 
of  these  features  seem  appropriate  for  use  with  linear  or 
quadratic  classifiers. 

The  features  were  examined  in  several  ways. 
First,  histograms  for  each  feature  were  produced  for  every 
sample  class.  These  histograms  were  examined 

to  see  whether  the  sample  distributions  satisfied  the  criteria 
noted  in  the  last  section.  The  differentiation  that  appeared 
was  interpreted  as  to  whether  it  was  a true  difference  be- 
tween classes,  or  simply  a sampling  anomaly.  (At  this  stage 
too,  particular  features  might  be  replaced  by  similar  features 
of  slightly  different  functional  form,  to  better  satisfy  the 
requirements  of  automatic  classification.)  Second,  those 
features  that  seemed  to  have  some  merit  were  ranked  for 
classification  power  at  each  node  of  the  decision  tree. 

The  "Automask"  method,  available  within  MIPACS,  was  used 
([25]).  Briefly,  Automask  finds,  for  each  feature,  its 
"share"  of  the  total  dispersion  both  between  and  within  sets. 
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and  finds  the  single  feature  which  produced  the  greatest 
comparative  variance  between  sets.  This  feature  is  then 
deleted  from  consideration,  and  the  other  features  reexamined 
to  find  the  next  best  feature,  and  so  on.  The  relative  merits 
of  the  features  for  each  node  are  shown  below. 


Node 

Good 

features 

Usable  features 

Small 

E&P 

(h/w) ' , (h*w)/A, (h+w) /P ,dif f , 
skewness,  asymmetry 

Large 

E&P, 

diff 

(h/w) ' , (h*w) /A, skewness , 
asymmetry , E 

Trucklike 

E , 

P 

asymmetry 

(h/w) ' , (h+w) /P, skewness , E&P 

Shape  features: 

In  the  first  stage,  the  (h/w) ' height-to-width  feature 
was  useful  in  identifying  small  bright  streaks  as  non-targets. 
In  the  statistical  classifier  for  small  targets,  shape  features 
were  individually  very  weak  in  distinguishing  targets  from 
non-targets.  For  large  targets,  diff  was  the  best  shape  fea- 
ture at  node  LARGE;  all  the  others  but  asymmetry  were  also  of 
some  use.  At  node  TRUCK-LIKE,  on  the  other  hand,  asymmetry 
was  the  best  shape  feature,  with  the  remainder  of  no  value. 

Brightness-related  features: 

Edge-border  coincidence  (E&P)  was  by  far  the  strongest 
single  feature  for  both  nodes  involving  target/non-target 
discrimination  (OBJ  and  LARGE).  For  small  targets,  it 
provides  nearly  all  the  discrimination  in  the  second  stage . 
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For  large  targets,  it  provides  evidence  which  is  well 
complemented  by  shape  information  — both  must  be  included 
for  adequate  performance.  Also  very  useful,  particularly 
at  stage  1,  is  E , which  provides  substantially  different 
information  from  E&P.  Gray  level  variance  is  used  to  some 
effect  in  the  first  classifier  stage,  but  is  not  effective 
in  the  second  stage.  Perimeter  contrast  information  appears 
to  be  much  more  effectively  conveyed  through  E^  than  dgl. 

These  rankings,  while  not  dependable  when  taken 
alone,  have  been  very  helpful  in  suggesting  which  features 
could  usefully  be  included  in  decisions  at  each  node  and 
which  should  be  omitted.  This  was  especially  helpful  in 
the  case  of  the  shape  features,  for  which  estimates  of 
relative  merit  were  not  obtainable. 

The  final  stage  of  feature  testing  was  experi- 
mental. Features  suggested  either  by  Automask  or  by  the  problem 
definition  were  included  in  decision  functions,  and  self- 
classification attempted.  In  many  cases,  the  results  were 
not  satisfactory  and  one  or  more  features  were  added  or 
deleted  until  "good"  results  were  obtained.  If  too  many 
features  were  present  in  this  classifier,  features  were 
removed  until  the  best  classification  obtained  with  an 
acceptable  number  of  features  was  found. 
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3. 9. 4. 2 Classification 


The  NVL  data  base  as  windowed  for  classifi- 
cation purposes  consists  of: 

75  Tanks 
34  Trucks 
55  APC's 

164  Target  windows 
10  Non-target  windows 
174  Total  windows 

Associated  with  each  window  was  a liberal 
threshold  range  extending  from  the  shoulder  of  the  background 
peak  gray  level  to  the  highest  gray  level  at  which  there  was 
significant  sensor  response.  Although  these  ranges  were 
manually  selected,  this  is  not  a significant  interference 
with  the  automatic  nature  of  the  algorithm  since  the  gray 
level  ranges  can  be  chosen  by  a simple  scheme  which  identifies 
the  background  peak  and  proposes  every  threshold  above  the 
peak.  (If  a coarse  temperature  calibration  is  available, 
this  task  is  even  simpler.)  See  Section  3. 9. 4. 3 for 
further  discussion. 

The  Superslice  algorithm  was  run  on  these 
windows  using  the  selected  gray  level  ranges.  Connected 
components  whose  contrast,  edge-perimeter  match  score  and 
size  were  within  tolerance  were  retained.  The  resulting 
sets  of  regions  are  described  by  the  containment  forests  in 
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Table  3.9.1.  Within  each  containment  tree.  Superslice 
selects  the  best  exemplar (s)  for  the  candidate  object  region 
based  on  edge  match.  Thus,  every  tree  has  one  or  more  best 
exemplars  associated  with  it.  All  other  (non-exemplar) 
regions  are  suppressed  since  the  algorithm  has  proposed 
better  representatives  for  classification. 

Each  containment  tree  is  manually  labelled  as 
either  "target-related"  (containing  regions  associated  with 
the  target)  or  noise  (spatially  apart  from  a target  region) 
so  that  false  dismissals  can  be  determined. 

Of  the  164  target  windows,  two  windows  (64T, 

86A)  had  containment  forests  with  no  target-related  regions 
present.  At  this  stage,  the  false  dismissal  rate  is  2/164  ^ 1% 
for  Superslice.  Determination  of  a false  alarm  rate  is  in- 
appropriate since  the  discrimination  performed  by  Superslice 
is  "object  vs.  non-object,"  not  "target  vs.  non-target,"  and 
there  is  no  ground  truth  for  the  number  of  objects  (including 
targets,  hot  rocks,  trees,  etc.)  in  the  frames. 

The  next  stage  - preclassification  - performs 
possible-target  vs.  non-target  screening.  [For  the  purpose 
of  building  the  screening  criteria  and  subsequent  classifier, 
a single  exemplar  per  target  was  hand-chosen.  No  other 
target-related  regions  were  considered;  all  noise  regions, 
however,  were  retained.]  Of  the  162  target  windows,  the  pre- 
classifier retained  161  for  a false  dismissal  rate  of  1%.  In 
addition,  44  noise  exemplars  also  survived  as  possible  targets. 
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Window 

Reference 

Number 


Lowest 

Threshold 


Containment  Forests 


IT 

23 

X (N , TTT (PPPPPP  , PP)  , NNN , N ( N , NN ) ,'rJN)  ;NN 

2T 

23 

TTTTTTTT 

3T 

25 

TTTTTTT ( PP , P ) ; NN ; NN 

4T 

30 

TTTTTT 

6T 

25 

TTTTTT 

8T 

26 

TTTTT 

9T 

24 

TTTTTT (P,P) 

10T 

25 

TTTTTT 

11T 

25 

TTTTTTTT ; NN 

12T 

22 

X(PPPPP(P,P(P,P)  ) ,N) 

13T 

20 

XX ( N , TTTT ) ;N 

14T 

22 

TTTTTTT 

15T 

30 

TTTTT 

16T 

24 

TTTTTTT 

17T 

26 

TTTTT 

21T 

26 

TTTTT 

22T 

25 

TTTTTT 

24T 

29 

TTTTT 

26T 

26 

TTTTT 

28T 

27 

TTT 

3 IT 

27 

TTT 

32T 

21 

xTtttt,n,n) 

33T 

23 

VTTTT ; N 

34T 

26 

TTT 

35T 

24 

TTTT;N 

38T 

24 

TTTTTTT 

40T 

23 

TT ; NN ; N 

42T 

24 

TTTTT (P,P (PP,PP) ) 

Table  3.9.1.  Containment  forests  of  regions  extracted  by 
Superslice (Tanks ) . "AB"  means  that  region  A contains 
region  B.  "A(B,C)"  means  that  region  A contains  the 
disjoint  regions  B and  C.  "A; B"  means  that  A and  B are 
disjoint  regions  in  the  window.  Underlined  letters  denote 
"best"  exemplars  of  the  target  region.  Target  trees  begin 
at  lowest  threshold. 


Legend : 


T target 
P partial  target 
X target  with  additional  noise 
0 target  invisible  in  noise 
N noise  region 
F fiducial  mark 

V target  region  not  present  at  this  threshold 
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43T 
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80T 

22 

TTTT ( P , PPPPP ( P ( P , P ) ,PP) ) 

89T 

23 

TTTTTTTTTT 

92T 

22 

TTT TTT TTTTTT 

95T 

24 

99T 

21 

TTTTTT TTTTTT 

105T 

24 

TTTTTTTTTT ( P , P , P ) 

109T 

23 

TT (PPPP , PPPPP , P (PPPP , PP) ) 

HOT 

24 

TTT TTTT 

114T 

25 

TTTTTTTTTT 

122T 

20 

TTTTT 

123T 

22 

TT TTTTT 

124T 

21 

TTTTTTTT 

125T 

24 

TTTTT 

126T 

23 

TTTT 

127T 

24 

TTTTTTTT 

128T 

23 

TTTT TTT 

129T 

24 

130T 

25 

TTTTTT 

131T 

26 

TTTTT 

found) 


3-129 


Window 

Reference 

Number 


Lowest 

Threshold 


Containment  Forests 


3R 

23 

X ( TTT , NNN ( NNNN , NNNNN ) ) ; N 

4R 

22 

TTTTTT ; N ; N ; NNN 

6R 

23 

OTTTTTT ; NN 

9R 

23 

X (TTTT,N (NN,N) , NN , N , N ) 

18R 

26 

vtttTn 

22R 

24 

TTTTTT 

24R 

28 

x(x7ttt(pp,p) ,n) ) 

26R 

27 

TTT;N 

31R 

26 

OTTTT 

32R 

21 

X (PP , N , N , N) ; NN 

33R 

23 

X (X (TTT , N) , N 

34R 

24 

WTTT;N;N;N 

35R 

23 

TTT;  N ; N ; N 

41R 

25 

TTTTTTTTTTT 

47R 

25 

TT TTT TTT TTT 

51R 

25 

TTT ; N ; N 

52R 

23 

TTT 

53R 

24 

TT 

54R 

23 

TT;  N 

55R 

23 

VTTT ; N ; N 

56R 

24 

TTTTTT ; NNN 

57R 

24 

TTTT  7 NNN ; N ; NN 

58R 

24 

TTTT ; NN 

59R 

23 

TTTT ; NN ; N ; N 

71R 

44 

TTTTT 

72R 

46 

TTT ; NN ; N 

77R 

27 

TTTTTT (P,P) 

100R 

23 

TTTTTTT 

104R 

27 

TTTTTTT 

109R 

27 

TTT (P ,PPP) 

123R 

27 

X (TT (P,P) ,N) 

133R 

27 

TTTT 

134R 

27 

XTTT (P , P ) 

135R 

26 

TTTT 

ble  3.9.1. 

Continued.  (Truck  windows) 
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Window 

Reference 

Number 


Lowest 

Threshold 


Containment  Forests 


21A 

26 

TTTTTT 

22A 

22 

TTTTTT 

24A 

28 

77777 

27A 

27 

VTTT ; N 

32A 

25 

TT_ 

33A 

25 

T;  N;N 

34A 

26 

TTT 

35A 

25 

TT 

37A 

27 

TTTTTTTT 

38A 

23 

77777 

42A 

24 

TTTT (PP , PP] 

44A 

28 

77777777 

45A 

26 

TTTT ; N ; N 

46A 

26 

TTTTTT 

48A 

26 

7777777 

50A 

24 

TT- 

51A 

25 

TTTT  ; N ; N 

52A 

25 

TT;N;N 

53A 

24 

TTT ; N ; N ; NN 

54A 

25 

TTT 

55A 

26 

T 

56A 

25 

TTTT 

57A 

24 

TTTT 

58A 

25 

77777 

59A 

24 

7777777 

61A 

41 

7777777 

73A 

43 

777777 

74A 

43 

TTTTTTT 

75A 

25 

TTTTTTT ; N 

76A 

26 

TTTTTTTTT 

78A 

31 

p (ppTp) 

79A 

25 

ttt(pppp,p: 

80A 

24 

7777777 

86A 

24 

FFFFF;NN;N 

90A 

25 

TTTTTTTTTT 

91A 

26 

TTTTTTTTT 

93A 

26 

TTTTTTTT (P 

94A 

26 

TTTTTTTT 

96A 

27 

777777 

97A 

24 

TTTTTTTT ; N 

98A 

24 

77777 

(no  target  related  region  found) 


Table  3.9.1.  Continued.  (APC  windows) 
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The  false  dismissal  was  66T  (small,  very  faint) . 

After  preclassification,  150  selected  target 
exemplars  and  all  44  noise  exemplars  were  split  into  a 
training  set  (74  targets  and  22  noise  regions)  and  a test 
set  (76  targets  and  22  noise  regions).  The  training  set 
was  used  to  design  the  optimum  decision  rule.  It  was  felt 
that  similar  results  in  classifying  both  sets  would  then 
indicate  that  the  classifier  had  utilized  robust  character- 
istics of  the  target  class  and  thus  could  be  expected  to 
give  similar  results  on  further  data  of  the  same  type. 

A linear  discriminant  was  used  at  the 
trucklike  node  while  a maximum  likelihood  discriminant 
was  used  at  the  small  target/non-target  node.  Five 
features  were  used  at  both  nodes,  of  which  four  were 
the  same:  (h*w)/A,  (h+w)/P,  asymmetry,  E&P.  The  fifth 

feature  was  diff  for  the  small  target  discriminant  and  skew- 
ness for  the  truck/APC  discriminant.  The  large  targets  are 
divided  into  three  classes  (tank,  truck/APC,  other)  by  a 
quadratic  maximum  likelihood  discriminant  using  six 
features:  (h/w) ' , (h*w)/A,  diff,  skewness,  E&P  and  E^. 

Two  different  procedures  for  classifying  large  regions 
( > 94  pixels)  were  tested.  One  procedure  attempted  to  dis- 
criminate between  four  fixed  classes  (tank,  APC,  truck, 
other);  the  other  procedure  used  three  classes  (tank,  APC, 
truck)  and  two  "reject"  categories  (non-target,  unidentified 
target) . Both  used  identical  polynomial  maps  into  decision 
space.  In  the  latter  classifier,  however,  the  maximum 
likelihood  class  assignment  of  a region  had  to  be  signifi- 
cantly better  than  for  random  noise  regions  (otherwise,  the 
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non-target  class  was  assigned)  and  significantly  better 
than  the  next  best  target  class  assignment  (otherwise,  it 
was  called  an  unidentified  target) . 

The  detection  results  using  the  fixed  class 
classifier  on  the  150  selected  target  exemplars  are 
summarized  by: 


Train 

Test 

Total 

Large 

53/53 

53/55 

106/108 

Small 

20/21 

20/21 

40/42 

Total 

73/74 

73/76 

146/150 

where  "M/N"  means  "M  successes  out  of  N tries."  This 
classifier  thus  appeared  to  be  robust. 

Table  3.9.2  displays  the  results  of  this 
classifier  for  all  extracted  regions,  including  all  target 
and  noise  exemplars.  A false  dismissal  for  a window  con- 
taining a target  occurs  when  no  target  exemplar  (at  any  of 
the  thresholds)  is  classified  as  a target  (i.e.,  classified 
as  tank,  truck  or  APC) . Similarly,  a false  alarm  is  any 
noise  exemplar  (i.e.,  not  associated  spatially  with  a tar- 
get region)  classified  as  a target.  However,  multiple  ex- 
emplars for  the  same  noise  region  are  counted  only  once. 

In  effect,  we  are  counting  the  image  regions  (as  opposed  to 
exemplars)  which  are  classified  as  target  regions  by  at 
least  one  exemplar.  If  a region  is,  in  fact,  a target  re- 
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Table  3.9.2  (continued):  APC  windows 


Thresholds 


00 


r- 


VO 


LT) 


ro 


cn 


o 


ON 

ro 


oo 

ro 


co 


vo 

ro 


lo 

ro 


ro 


ro 

ro 


eg 

ro 


ro 


o 

ro 


ON 

CN 


00 

CN 


r- 

CN 


VO 

CN 


in 

CN 


CN 


ro 

eg 


CN 

CN 


CN 


O 

CN 


0) 

E 

H 

Pn 


+ + 
T + + 

+ + + 

+ 4-  + 

+ + + 

+ jf  + 

+ 

+ 


+ + 
+ + 
+ + 
+ + 
+ o 


o 

T + 
+ + 
o + + 

0 0 4- 

o jf  + 

4-  o 


+ 

+ o 
4-  o 

O 4"  4-  + 

0 4-  o + 


o + 
4-  4* 

4-  + 

4-  jf 

+ 


+ + 

4-  + 

4-  + 

4-  + 

o 4* 

+ 


o4-o 

0 4-4-4- 

+ o + + *+ 

+ + + + + + 

+ + + + + + 

+ 4-4-4-4-4- 

"+  + + + + + T 

+ + + + + + 

+ + + o + + 

I 4 + + 

I + + 


o 

r— i r— i 

o + + 

o o + + 

+ + o + + 

+ + + + + 


+ + + 

+ + + 

+ + + 

+ + + 

+ + + 

+ + 


+ 

+ 

+ 

+ 

+ 


< 

< 

< 

< 

< 

< 

< 

< 

< 

< 

< 

< 

< 

< 

< 

< 

< 

< 

< 

< 

< 

< 

< 

< 

r- 

00 

ON 

rH 

ro 

in 

VO 

00 

ON 

o 

VO 

o 

H 

ro 

VO 

r- 

00 

rH 

CN 

rH 

CM 

m 

in 

in 

in 

VO 

r^> 

00 

00 

ON 

ON 

ON 

ON 

ON 

ON 

ON 

O 

O 

rH 

<H 

»H 

rH 

rH 

•H 

*H 

rH 

3-142 


114A 


Thresholds 


00 

r^ 

vo 

in 

co 

CM 


o 

cn 

co 

oo 

co 

r- 

co 

vo 

co 

in 

co 

co 

co 

co 

cn 

CO 


CO 

o 

CO 

a\ 

CM 

00 

CM 

CM 

VO 

CM 

in 

CM 

CM 

CO 

CM 

CM 

CM 


O O 

f—y 

+ + 

o T + + 

+ + + 

o + + + T 

+ + + + + 

+ + 


CM 

O 

CM 


< 

< 

< 

< 

< 

< 

CM 

CO 

in 

r- 

C\ 

O 

CM 

CM 

CM 

CM 

CM 

CO 

iH 

r— 1 

r— 1 

c-H 

c-H 

H 

3-143 


w 

TJ 

rH 

O 

XI 

0) 

<D 

U 

rC 

Ei 


00 


r- 


vo 


in 

'sT 


co 


CN 


O 


o> 

co 


oo 

co 


co 


VO 

CO 


in 

co 


co 


co 

co 


CN 

co 


o 

co 


CT\ 

CN 


00 

CN 


CN 


VO 

CN 


in 

CN 


CN 


CO 

CN 


CN 

CN 


CN 


O 

CN 


I 

cd 

u 

fa 


I I 


w 

s 

o 

TJ 

G 

*rH 

<1) 

0) 

*H 

O 

55 


<1) 

3 

C 

•H 

-P 

c 

o 

o 


CN 

a\ 

co 

a) 

« — i 

x 

cd 

Eh 


5 

S3 

£ 

£ 

J3 

£ 

52 

!3 

£ 

£ 

CN 

00 

o 

VO 

CN 

00 

o 

VO 

rH 

CN 

CN 

CO 

CO 

in 

in 

3-144 


gion  and  some  exemplar  of  it  is  called  a target,  that  is  a 
success.  If  no  exemplar  is  so  called,  then  a false  dis- 
missal has  occurred.  Finally,  if  the  so-called  target  re- 
gion does  not,  in  fact,  contain  a target,  then  a false  alarm 
has  occurred. 

The  classifier  results  consist  of  6 false 
alarms  and  3 false  dismissals  from  the  162  target  windows 
and  2 more  false  alarms  from  10  non-target  windows . No 
window  contained  more  than  one  false  alarm  cue.  Details 
are  as  follows: 


False  Dismissals 

False  Alarms 

32R 

3T 

35R 

11T 

33A 

3R 

56R 

59R 

86A 

2N 

8N 

Figure  3.9.3a  displays  the  6 (total)  false  dismissals. 

Masks  of  the  8 false  alarms  along  with  their  gray  level 
windows  are  shown  in  Figure  3.9.3b. 

The  question  of  how  target  identifications 
can  be  made  in  this  environment  of  multiple  exemplars, 
while  secondary  to  the  task  of  detection,  is  an  interesting 
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64T  66T  32R  35R 


Figure  3.9.3.  Classification  results  for  NVL 
data  base. 

a.  Six  false  dismissals. 

b.  Eight  false  alarm  region  masks 
with  their  gray  level  windows. 
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one.  Since  each  exemplar  in  a containment  tree  can  be 
classified  independently,  there  are  many  ways  of  arriving 
at  a final  region  label.  Section  3.9.5  discusses  the  use 
of  context  and  considers  the  identification  of  object  re- 
gions from  the  classifications  in  their  containment  trees 
as  an  example  of  context.  We  discuss  the  issue  here  simply 
from  the  point  of  view  of  critiquing  the  classifier  perfor- 
mance. For  each  containment  tree  containing  at  least  one 
exemplar  classified  as  a target,  we  chose  the  target  type 
of  the  exemplar  with  the  best  edge-match  (E&P)  score  in 
the  tree  and  used  that  target  type  to  designate  the  region. 
In  the  event  that  the  "best"  exemplar  was  not  described  as 
a target,  we  labelled  the  object  region  "unknown  target". 
Only  large  targets  were  considered,  since  small  targets 
while  detectable  were  not  considered  identifiable. 

In  a test  which  classified  all  best  exemplars 
of  large  targets  (55  tanks,  21  trucks,  36  APC's)  the 
between-types  confusion  matrix  was: 


A priori 


" T 
< Tr 
A 

V — 


classified  as 

/ * * 

T Tr  A UT 

40  5 6 4 

6 8 7 0 

9 5 20  2 


where  "UT"  is  the  "unknown-target"  type.  The  8 false 
alarms  were  classified  as  1 truck,  2 APC's,  and  5 small 
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targets.  Between-class  confusion  is  high,  with  tanks 
being  the  most  successful  class.  Trucks  and  APC's  were 
often  confused  with  tanks.  A number  of  reasons  can  be  ad- 
vanced for  this  performance.  First,  tanks  were  the  most 
numerous  target  and  therefore  could  be  identified  most  con- 
fidently. Second,  large  APC's  appeared  with  the  wooden  wave 
deflection  board  in  view,  producing  a characteristic  "c" 
shape.  No  attempt  was  made  to  utilize  this  special  know- 
ledge. Third,  the  large  targets  appeared  in  only  a single 
aspect  and  no  generalized  shape  descriptors  separating  the 
different  types  could  be  extracted  realiably.  It  seems 
most  sensible  to  model  the  target  types  as  three-dimensional 
objects  and  to  derive  discriminators  from  their  inherent 
shape  and  size  differences  from  all  aspects. 

The  second  classifier  (which  applied  a 
threshold  to  reduce  the  false-alarm  rate)  did  not  improve 
classification  as  might  have  been  expected.  Any  threshold 
which  would  have  reduced  the  number  of  false  alarms  also 
caused  a number  of  false  dismissals.  Thus  while  the 
method  might  be  of  use,  its  utility  could  not  be  judged  on 
the  limited  data  set  available  especially  since  there  is  no 
model  relating  the  false  alarm  rate  to  the  false  dismissal 
rate . 

We  may  summarize  the  principal  classification 
results  as  follows:  the  false  dismissal  rate  of  the  system 
is  less  than  4%,  giving  a system  detection  rate  of  96%. 
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The  false  alarm  rate,  based  on  the  number  of  false  alarm  re- 
gions per  unit  area,  is  8 false  alarms  in  174  (128x128) 
windows.  Assuming  there  are  500x800  pixels  per  frame  and 
that  a target  occupies  about  1/10  of  a window,  we  conclude 
that  the  total  processed  area  corresponds  to  about  6 
frames.  Thus  the  false  alarm  rate  is  8/6  or  1.3  per  frame. 

A separate  test  of  the  false  alarm  rate  was  made  using  a 
set  of  four  512x512  pixel  frames  (Figure  3.9.4).  All 
available  targets  were  detected.  In  addition,  4 large 
false  alarms  and  8 small  false  alarms  were  detected  (see 
Figure  3.9.5).  However,  5 of  the  8 small  false  alarms 
corresponded  to  fiducial  marks.  Moreover,  one  large  false 
alarm  (in  FI)  appears  to  be  a target.  In  any  case,  7 
false  alarms  in  4 frames  agrees  well  with  the  previous 
estimate  of  the  false  alarm  rate. 
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Figure  3.9.4.  Four  256x256  frames  (after  median 

filtering  and  sampling) . 
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Figure  3.9.5.  Cued  regions  in  the  four  frames 

of  Figure  3.9.4.  All  targets 
were  detected  (masks  indicated 
with  arrows) , along  with  12  false 
alarms  (5  corresponding  to 
fiducial  marks) . 
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3. 9. 4. 3 Threshold  selection  evaluation 


Our  method  of  threshold  range  selection  was 
described  previously.  However,  it  bears  repetition  in  this 
section.  Using  the  histogram  of  gray  levels  (perhaps  of 
the  previous  image) , choose  as  a range  the  sequence  of  gray 
levels  from  the  mode  to  the  highest  giay  level  with 
appreciable  response  (e.g.,  more  than  5 points).  The  pre- 
vious subsection  demonstrated  that  this  brute  force  approach 
gave  excellent  system  detection  efficiency.  Naturally,  the 
liberal  range  of  thresholds  has  important  effects  on 
system  architecture,  as  discussed  in  Section  4. 

Since  the  number  of  thresholds  used  deter- 
mines the  time  cost  (in  a sequential  implementation)  or 
the  hardware  replication  cost  (in  a parallel  implementation) , 
it  is  appropriate  to  consider  methods  which  can  accommodate 
a limited  number  of  thresholds.  "Intelligent”  methods  of 
threshold  selection  are  discussed  in  Sections  3.5  and  3.10.1. 
We  wish  to  consider  "brute  force"  methods  which  select 
thresholds  at  every  other  gray  level , at  every  third  gray 
level,  etc. 

As  may  be  seen  from  Table  3.9.2,  correct 
target  detections  for  single  windows  tend  to  occur  in  ex- 
tended runs.  Table  3.9.3  provides  a histogram  of  run 
lengths.  In  general,  large  targets  had  better  contrast 
and  their  detections  were  stable  over  long  runs.  Small 
targets  were  fainter  and  were  detectable  over  only  a few 


3-152 


Run  length 

# of  windows 

Cumulative 

count 

% of  164 
windows 

0 

5 

164 

100 

1 

17 

159 

97 

2 

25 

142 

87 

3 

29 

117 

71 

4 

27 

88 

54 

5 

19 

61 

37 

6 

12 

42 

26 

7 

10 

30 

18 

8 

8 

20 

12 

9 

7 

12 

7 

10 

3 

5 

3 

11 

2 

2 

1 

Table  3.9.3.  Statistics  of  longest  runs  of 
correct  target  detections  in 
164  target  windows. 
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thresholds  at  most.  Table  3.9.3  shows  what  percentage  of 
the  targets  were  detected  within  runs  of  I or  longer  for 
I = 1,2,...  . Thus  the  false  dismissal  rate  would  be 

11%  if  every  other  threshold  in  the  range  were  omitted. 

Since  there  were  so  few  false  alarms,  it  is  not  possible  to 
give  comparable  statistics  of  any  reliability,  but  any 
scheme  which  considers  fewer  exemplars  is  bound  to  detect 
fewer  false  alarms. 

From  a slightly  different  point  of  view,  we 
might  consider  how  to  allocate  a fixed  number  of  thresholds 
within  a given  gray  level  range.  In  the  Hardware  Technology 
section,  the  design  assumes  that  five  thresholds  were  im- 
plemented in  parallel  hardware.  Thus,  for  a gray  level 
range  of  10,  thresholds  would  occur  at  every  other  gray 
level;  for  a range  of  20,  thresholds  would  occur  at  every 
fourth  gray  level.  If  we  use  the  gray  level  ranges  in- 
dicated by  brackets  in  Table  3.9.2  and  distribute 
N (=1,2,3,...)  thresholds  equally  spaced  (where  feasible) 
throughout  the  range,  we  compute  the  following  results: 

N # False  Dismissals  # False  Alarms 

125  1 

2 14  3 

3 7 7 

4 and  above  5 8 

Thus,  for  four  or  more  thresholds  equally  spaced  through- 
out the  available  gray  level  range  of  each  window,  no 
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additional  false  dismissals  occurred  beyond  those  already 
dismissed  using  the  whole  range.  Interestingly,  for  small 
N the  increase  in  false  dismissals  is  just  about  compensated 
by  the  decrease  in  false  alarms . One  is  doubled  as  the 
other  is  halved. 

Naturally,  the  threshold  ranges  depend  both 
on  window  size  and  on  window  content.  It  is  therefore  not 
likely  that  three  thresholds  will  be  sufficient  in  practice. 
The  best  choice  of  N,  the  number  of  thresholds,  will  result 
from  estimating  the  probability/cost  tradeoff  for  faint  tar- 
gets. Given  a range  of  x gray  levels  for  target  regions,  N 
should  be  about  x/2  or  x/3 , which  for  the  current  data 
base  suggests  that  N should  lie  between  5 and  10 . For  an 
extension  to  image  sequences,  see  Section  3.10.1. 
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3. 9. 4. 4 Classifier  extension 


An  attempt  was  made  to  apply  the  classifier 
derived  from  the  NVL  data  base  to  a different  set  of  thermal 
images.  The  Alabama  data  base  is  a set  of  imagery  taken 
with  a thermoscope.  The  actual  sensor  data  are  classified; 
radiometric  noise  was  added  to  mask  the  source.  Figure  3.9.6 
exemplifies  the  type  of  imagery  involved.  The  gray  level 
histograms  are  not  smooth  and  in  some  cases  runs  of  gray 
level  bins  contain  no  points.  Median  filtering  (using  odd 
sizes)  cannot  be  used  to  smooth  such  images  since  it  pre- 
serves false  contours.  Median  filtering  using  even  sizes 
provides  a small  degree  of  smoothing.  We  elected  to  smooth 
by  locally  averaging  over  a 2x2  neighborhood  just  to  intro- 
duce sufficient  gray  level  variation  so  that  5x5  median 
filtering  would  be  effective. 

The  resultant  images  were  windowed  and 
threshold  ranges  were  selected.  The  Superslice  algorithm  was 
then  applied  in  order  to  extract  candidate  object  regions. 

It  was  necessary  to  increase  the  contrast  threshold  since 
the  inherent  contrast  (including  false  contours)  was  higher 
than  in  the  NVL  data  base.  With  this  adjustment,  the  Super- 
slice algorithm  extracted  regions  corresponding  to  64  out  of 
65  targets.  After  classification,  60  out  of  65  were  detected. 
In  addition,  there  were  3 false  alarms  in  the  48  64x64  win- 
downs  considered  (although  one  of  the  false  alarms  appears 
to  be  a target  missing  from  the  ground  truth) . 
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Figure  3.9.6.  Alabama  data  base  (selected  frames). 
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Figure  3.9.6  (continued) 
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3. 9. 4. 5 Feature  data  base 


This  subsection  tabulates  for  each  window 
all  exemplars  and  their  associated  feature  vectors  (Table 
3.9.4).  It  also  includes  the  feature  weightings  for  each 
node  of  the  classifiers  and  the  associated  thresholds 
(Table  3.9.5) . 


3-159 


r»ws 


(C*  lM*f$H  tOb  A H/V  CH/M  > ' 


N«M/A  <M*V)/0  DIM 


tocv  bt  oirr  Up  t * 


61 

1! 

Hi 

151 

W 

81 

<61 

}ll 

jn 

hi 

30  t 

♦fl 

431 
ill 
A6  I 

58! 

6tl 

111 

HI 

£61 

!S! 
• 01 

??i 

:?i 


im 
mi 
i ?? 

126  T 

HM 

MU 

Hi 

;?{ 
w» 
>si 
Si  r 

Hi 

Hi 

HI 

6if 
6«f 
661 
1 T 


6 

1 

.! 

I 

i 

it 

is 

} 


« t.-« 

3 2 i.«6 

i!  l;ii 

ii  t:U 
I?  fell 

I!  !:8 

if  ?:« 
i\  ?:« 
ft  b\i 

if  0.33 

3 t:Sl 

28  6.*U 

28  0.71 

2<>  6.iS 
2i  i.16 
66  6.28 
*6  *.33 

66  0.?6 

H »:?! 

I!  !ii! 

n vm 

ii  till 
h *:IT 
« i:i! 


■M 

.6  1 
.6? 

At 

it 


lir 

At 

.66 

It] 

At 

.61 

ill 

■M 

:t? 

:V, 

:H 

:!? 

:i» 


M 
• 62 
.36 

:fl 

:H 

:« 

.66 

ii; 

:i: 

:53 


1:31 

1.62 

1.3? 

I:H 

1:55 

1:H 

‘ ‘1 


:5?  :M 

.32  .72 
.63  .70 


8 1 lit 


:li 

:i? 

il! 

:1J 

.10 

itt 

.iv 


.72  .10 

■;ii  ;!i 


]:» 

:H 

1:11 


1:1? 

i:B 

1.18 

1.2i 

J:?8 

i:H 

l:H 

’:?5 

:ii 

:« 


:M 

:i? 

:ii 

.66 

•M 

.63 
• 6 • 


: I 
:|l 

III 

!il 

ii! 


:58 

:>i 

:H 

•P 

!ji 

:?! 

:SJ 

:« 

.32 

.*•6 

.30 

:8 

:H 


Hit 

S;fi 

!:?> 

:!8 

ife 

1:88 

1.2S 

lift 

f:|? 


:!l 

:i| 

.63 

.39 

•iil 


1:11 

3:1? 

1:8 


Hit 
5:« 


:« 

.71 

.6® 

:S5 

:n 

:W 

ii! 

:53 

.61 

:3t 

Isi 

.63 

.67 

.33 

l03 

:i! 

.39 

.63 

.39 

.67 

:J» 

:H 

:?! 

:K 


lit!  1:8 

iili  !•:?? 
few 
?:?? 


5.16 

1.96 

1:13 

’:?! 

.99 

1.63 


1:1 

hi 

1:8 

1:11 

3:i! 

rM 

?:>o 


2.62 

1:i« 

5:53 

1:85 

i.ai 

1:53 

1 .»5 
/.6l 

•1 

1:5? 

1.03 

1:51 

1.0 

if 

l:il 

i:il 

5:51 

1:1? 

2.03 


il  i:ii 


5:2? 


:H 

:>! 


1:5! 

1:?1 


.ef 

.S« 

:H 

• 03 
.06 

Ai 

.63 

ill 


il!  ill! 
lit  iili 

:ll  1:1* 
:il  i:tl 
:SJ  1:15 

ill  ii?! 

:» 


i 

if 

1:6?  i:?f 

:?f 

•At 

-:W 

i:U 

:?t 

lit 

:»i 

i:H 

3 

6 

3 

it 

25 

f:U  :ji 

3.50  .89 

:8 1 \ 

ii!  1 

.43 

ill 

.43 

il! 

fill 

:1? 

.09 

1:11 

1.69 

1:« 

2.66 

ii! 

til? 

6 

4 

26 

t:51 

:52 

Al 

•:» 

1:?? 

:» 

(:1! 

i:» 

:51 

t:?i 

1 :i1  *A1 

:i? 

:«  :?1 

1 

? :«  f:i? 

:« 

*:U  «:«  :?< 

l 

it 

il 

21 

if 

21 

H 

ii 

H 
1 1 


ii 

fi 

61 

| 

■ I 

?! 

91 

?i 

181 
T 

I 

ni 

Ml 

in 

ill 

if! 

Hi 

161 

Ui 

111 

Ui 

It! 

161 

101 

161 

Ui 

101 

Hi 


il  hit 
it  1:11 


I 1 !! 


if 


8 


s 


Al  *:« 
:5i  *:>3 


.31 

:>9 

-M 

.73 

:?? 


.61 

.34 

.4« 

.67 

:18 

:!? 
• o3 
.71 

:i! 

.67 

.64 


2.08  2.27 


2.99 

2.88 


1 , 

ii!! 

m 


.19 

.70 

.03 

:?' 

:U 

ill 

iff 

.07 

ill 

:K 

:W 

.70 

if! 

■M 

iff 


• 

Ii!? 

1:8 


1.62 

»:» 

. J9 
.38 
.41 

.70 

.70 

.70 

2.32 

f:>? 

.70 

.76 

.73 

2.63 

1:31 

2.71 

i:!5 

lit! 

:tl 

.40 

:H 

.68 

2.11 

:?5 

.68 

liH 

5:H 

0.41 

1:H 

•At 

f:?i 

1:71 

Table  3.9.4, 


Recognition  feature  values  for  all 
candidate  target  regions  from  174 
windows . 
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Table  3.9.4  (continued) 


b.  Trucks. 
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Table  3.9.4  (continued) 
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Table  3.9.4  (continued) 

d.  Non-target  windows. 
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1.  20  iN  s 1000 

2.  Gray  level  difference  £ 0.2 

3.  E&P  ^ 0.2 


a.  Superslice  decision  thresholds.  Candidate  object 
regions  must  satisfy  all  of  these  conditions. 

1.  (h/w)  ' a.  1.0 

2.  (h*w)/A  s.  2.5 

3.  | SDEV-3 | a,  2.5 

4.  (E&P  * 0.4)  V ((E&P  * 0.25)  A (logeN  > 4.5)) 

5.  E * 0.75 

P 

6.  Max(h,w)  < 64 

b.  Preclassifier  decision  thresholds.  Candidate  target 
regions  must  satisfy  all  of  these  conditions. 


Table  3.9.5.  Classification  decision  threshold. 

(Feature  variables  are  defined  in 
Table  3.8.1.) 
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Feature 

factors 


Noise 

coefficients 


Target 

coefficients 


c3 

c4 

c5 

cl 

clO 

c3 

c3 

c4 

c4 

c5 

c5 

c7 

c7 

clO 

clO 

c3 

c4 

c3 

c5 

c3 

cl 

c4 

clO 

c4 

c5 

c4 

cl 

c4 

clO 

c5 

cl 

c5 

clO 

cl 

clO 

. 898+02 
. 858+02 
. 817+01 
. 675+02 
. 887+02 
818+02 
834+03 
466+01 
334+02 
719+02 
. 833+03 
. 709+01 
341+02 
624+02 
697+02 
. 329+03 
. 313+03 
.111 +02 
. J 51+02 
594+02 
785+02 


. 851+03 
. 381+03 
853+02 
. 854+02 
. 894+03 
3 18+03 
3 32+04 
- 180+02 
- 349+02 
106+03 
. 435+03 
800+02 
887+02 
3 21+03 
. 342+03 
. 783+02 
. 852+02 
. 866+02 
678+01 
476+01 
357+03 


Table  3.9.5c.  Maximum- likelihood  discriminant 

used  at  small  node  using 
features  c3,  c4 , c5,  c7 , clO.  A 
region  is  small  if  log  N s»  4.5. 
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Feature 

factors 


Noise 

coefficients 


Truck-like 

coefficients 


Tank 

coefficients 


c2 

c3 

c5 

c6 

clO 

ell 

c2 

c2 

c3 

c3 

c5 

c5 

c6 

c6 

clO 

clO 

ell 

ell 

c2 

c3 

c2 

c5 

c2 

c6 

c2 

clO 

c2 

ell 

c3 

c5 

c3 

c6 

c3 

clO 

c3 

ell 

c5 

c6 

c5 

clO 

c5 

ell 

c6 

clO 

c6 

ell 

clO 

ell 

.169+03 

.519-01 

-.551+01 

.632+02 

♦259+03 

.157+03 

-.325+02 

-.112+02 

-.454+01 

-.500+01 

-.153+03 

-.264+02 

.526+01 

-.255+01 

-.211+02 

-.808+02 

-.487+02 

.283+01 

.175+01 

.377+02 

.700+01 

-.230+01 

.127+02 

.266+01 

-.205+02. 

-.184+02 

-.774+02 

-.267+03 


.277+02 
.174+02 
.216+01 
.262+01 
.683+02 
-.132+01 
-.160+02 
-.658+01 
-.520+01 
-.339+00 
-.582+02 
-.688+00 
-.857+01 
.151+01 
-.892-01 
-.203+02 
.248+01 
-.303+01 
.835+00 
-.220+01 
. 125+01 
.387+00 
.647+01 
-.641+00 
-.155+01 
-.236+00 
.597+01 
-.413+02 


. 179+02 
.698+02 
.138+02 
.913+01 
.910+02 
-.189+01 
-.273+02 
-.176+02 
-.132+02 
-.457+00 
- . 473+02 
-.367+00 
-.878+01 
.222+02 
-.951+00 
-.119+02 
.323+01 
.157+01 
-.287+01 
-♦ 199+02 
.550100 
-.942+00 
- . 124+02 
-.545100 
-.368+01 
. 150+00 
.424+01 
-.924+02 


Table  3.9.5d.  Maximum-likelihood  discriminant 

used  at  large  node  using  features 
c2 , c3 , c5 , c6 , clO , ell.  A re- 
region is  large  if  logeN  > 4.5. 
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Feature 

Fisher  direction 

(h*w) /A 

-.45 

(h+w) /P 

.088 

skewness 

.028 

asymmetry 

.56 

E&P 

-.69 

Threshold  = -.7736 


Table 

3.9.5e.  Fisher  linear  discriminant 
used  at  truck-like  node 
using  features  c3,  c4 , c6, 
cl,  clO. 
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3.9.5  Classification  and  context 


Our  approach  to  the  target  cueing  problem  has  been 
to  extract  and  classify  object  regions  independently  of  one 
another.  That  is,  segmentation  is  based  on  the  assumption 
that  the  object  regions  are  individually  thresholdable , 
though  not  necessarily  by  the  same  threshold.  Classification 
is  based  on  information  derived  from  measurements  on  the 
individual  components  but  does  not  take  into  account  the 
intra-  and  inter-frame  context  of  a region. 

The  Gestalt  laws  of  grouping  (see  [26])  are  of 
interest  in  this  respect  since  they  refer  to  factors  that 
cause  some  parts  to  be  seen  as  belonging  more  closely  to- 
gether than  others.  These  rules  are  applications  of  the 
basic  principles  of  similarity  which  assert  that  region 
association  is  partly  defined  by  region  resemblance. 

There  are  several  types  of  similarity  which  could  be 
used  with  FLIR  imagery,  e.g.,  similarity  of  appearance 
(size,  shape,  brightness,  etc.),  similarity  of  location  or 
proximity,  similarity  of  spatial  arrangement,  and  temporal 
similarity  (multiple  views  of  the  same  object  in  different 
frames) . 

Whenever  one  can  confidently  group  a set  of  N objects 
as  being  similar  (based  one  or  more  of  the  types  of  context 
discussed  above) , it  may  be  advantageous  to  classify  them 
collectively  (The  Compound  Decision  problem)  rather  than 
independently  (The  Simple  Decision  problem) . 
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The  compound  decision  problem  can  be  stated  briefly 
as  follows: 

There  are  a set  of  states  of  nature  ft  = {1,2, ...y} 
and  a set  of  actions  A = {1,2, ...s},  associated  with  an  rxs 
loss  matrix  being  defined  for  every  i €ft  and  j tA.  When 
the  same  decision  problem  is  confronted  N times,  there  exists 
a vector  - {9^ , ' * * * states  of  nature  where  0 t ftN 
and  a corresponding  vector  XN  = {x^  ,yl^  , . . .x^}  of  random 
variables.  0^  denotes  the  state  in  the  kth  problem  and  the 
distribution  of  x^  is  P ( I ® ) * For  a given  9^'  ^ is  inde- 
pendent of  other  x's  and  0's.  In  other  words 

pO%lV  - X PUjIe.) 

We  do  not  assume  that  the  0’s  are  necessarily  independent. 

The  loss  in  the  compound  decision  problem  is  taken 
to  be  the  average  of  the  losses  incurred  at  each  of  the  N 
decisions  and  the  compound  risk  is  defined  correspondingly. 

If  all  the  observations  X„  are  at  hand  before  the 
individual  decisions  must  be  made,  one  can  use  a compound 
decision  rule  ^ . • . t^ } where  t^  = t ( j | XN ) for 

each  XN  is  a distribution  over  A,  according  to  which  the 
kth  action  is  chosen.  Also  one  can  define  a sequential 
compound  decision  rule  if  only  the  observations x^  are  at 
hand  before  the  kth  decision  is  made. 
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It  is  possible  to  work  out  a decision  procedure  which 
is  compound  Bayes  against  the  distribution  G($N)  where 
8n  t (for  the  details  see  Abend  [27]). 

The  continuation  of  this  project  will  be  concerned 
with  the  use  of  context  in  general  and  with  the  compound 
decision  rule  in  particular  as  a way  of  combining  related 
observations.  Naturally,  this  will  require  a data  base  which 
is  sufficiently  structured  to  provide  the  necessary  context. 
However,  a recasting  of  the  problem  makes  another  type  of 
context  available. 

Consider  a set  of  nested  regions  (exemplars)  produced 
by  the  Superslice  algorithm.  We  wish  to  investigate  how 
these  regions  can  be  treated  in  ensemble  as  defining  (perhaps) 
a target  region.  This  suggests  the  following  experiment: 

Given  a set  of  object  regions  generated  by  Superslice,  class- 
ify them  independently.  Choose  a nested  region  of  signi- 
ficance: namely,  a subtree  in  the  containment  forest 

(corresponding  to  a given  window  or  frame)  all  of  whose 
paths  from  the  root  to  the  terminal  nodes  are  of  length  ^nt 
where  0 it  &1  and  n is  the  number  of  thresholds  used  by 
Superslice.  This  insures  that  for  a proper  choice 
of  t we  only  consider  nested  regions  which  keep  appearing 
for  a large  fraction  of  the  total  number  of  thresholds. 

For  each  such  nested  region  (subtree) , suppose  that 
there  is  a class,  say,  w (tank,  APC,  truck  or  noise)  such 
that  M of  the  N objects  in  the  subtree  have  been  assigned 
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to  w and  where  H is  the  length  of  the  longest  path  in 

the  subtree.  (This  rule  insures  that  for  a proper  choice 
of  t the  chosen  class  w really  dominates  the  entire  subtree.) 
We  then  assign  the  class  w to  all  of  the  N objects  in  the 
subtree.  Otherwise  we  leave  the  individual  classifications 
unaltered. 

In  an  experiment  using  the  NVL  data  base,  315  objects 
generated  by  Superslice  from  52  windows  were  considered. 

The  objects  were  hand  picked  to  belong  to  the  apriori  classes 
tank,  APC,  truck  and  noise,  and  were  then  classified  into 
five  classes,  viz.  Tank,  APC,  Truck,  Small  target,  and  Noise. 
The  corresponding  confusion  matrix  is  shown  in  Table  3.9.6a. 

We  then  applied  the  majority  logic  context  rule  on 
all  the  containment  forests  (52  of  them)  for  t = .5;  the 
resulting  confusion  matrix  is  shown  in  Table  3.9.6b. 

A comparison  between  the  two  matrices  shows  an 
improvement  in  the  false  dismissal  rate.  The  false  alarm 
rate  is  left  unchanged,  since  no  significant  nested  regions 
(for  t = .5)  could  be  found  where  the  noise  class  dominated 
the  target  class.  Within  the  target  classes  we  find  a 
marked  improvement  in  the  self-classification  of  tanks  and 
APC's.  However,  more  trucks  in  the  second  case  have  been 
misclassif ied  into  APC's.  This  is  presumably  not  due  to  an 
error  in  the  majority  logic  rule,  but  rather  due  to  the 
inability  of  the  classifier  to  discriminate  trucks  from  APC's. 
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A Priori  A Priori 


Classified  as 


Tank 

APC 

Truck 

Small 

Target 

Noise 

Tank 

28 

1 

2 

4 

19 

APC 

10 

26 

15 

35 

22 

Truck 

6 

10 

10 

27 

23 

Noise 

6 

1 

1 

7 

62 

Table  3.9.6a. 

Independent  classification  confusion 

Classified  as 

Small 

matrix 

Tank 

APC 

Truck 

Target 

Noise 

Tank 

40 

1 

0 

0 

13 

APC 

'13 

38 

11 

30 

16 

Truck 

6 

15 

6 

27 

22 

Noise 

6 

1 

1 

7 

62 

Table  3.9.6b. 

Majority 

logic 

classification  confusion  matrix 
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The  majority  logic  context  rule  is  not  necessarily 
a superior  classification  procedure,  since  Superslice  con- 
siders only  the  best  exemplars  and  may  therefore  produce  a 
better  classification.  However,  the  present  study  does 
support  the  relevance  of  low-level  context  for  classification 
validation. 
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3.10  The  Dynamic  Environment 


The  work  described  heretofore  has  considered  the 
analysis  of  single  frames.  However,  inasmuch  as  the  sensor 
is  capable  of  generating  30  frames  per  second  and  the 
hardware  is  capable  of  analyzing  about  3 frames  per  second, 
it  is  worthwhile  to  investigate  how  information  culled  from 
sequences  of  frames  can  improve  the  performance  of  the  system. 
There  are  two  ways  in  which  sequence  data  can  be  helpful. 

First,  a high  scanning  rate  allows  a succession  of  views  of 
the  same  scene  with  only  a small  amount  of  change  (dependent 
on  platform  motion).  Thus,  image  statistics  should  be  rela- 
tively stable  and  multiple  measurements  may  allow  a reduction 
of  the  standard  deviation  of  feature  values.  Second,  the  use 
of  motion  information  can  provide  a better  description  of  the 
object  regions  in  a scene.  For  this  project  only  a small 
data  base  of  ten  sequential  frames  was  available  (Figure  3.10.1). 
The  image  content  and  quality  are  similar  to  those  of  the  NVL 
data  base.  The  sequence  corresponds  to  every  other  frame 
from  the  FLIR  sensor  over  a span  of  2/3  of  a second.  The 
images  show  a tank  against  a background  of  trees,  and  fade  away 
more  with  each  frame.  While  this  data  base  was  not  large 
enough  to  permit  meaningful  tests,  it  did  allow  some 
exploratory  work. 
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Figure  3.10.1.  Ten  sequential  frames. 
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Figure  3.10.1  (continued) 
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3.10.1  Threshold  selection 


One  does  not  expect  (time-)  adjacent  frames  to  differ 
radically  and  therefore  it  should  be  possible  to  use  good 
thresholds  from  the  previous  frame  to  segment  the  current 
frame  or  at  least  to  guide  the  selection  of  thresholds  in 
the  current  frame.  A sequence  of  10  windows  was  extracted 
and  smoothed  (Figure  3.10.2)  and  a best  threshold  was  chosen 
for  each.  Figure  3.10.3  shows  the  effect  of  choosing  a 

lower  threshold  or  a higher  threshold.  As  may 
be  noted,  the  adjacent  thresholds  have  a fairly  negligible 
effect  on  the  target  region  although  there  is  a sizable 
change  in  the  amount  of  noise  (which  can  be  eliminated  by 
shrink/expand  noise  cleaning) . However,  if  one  considers  the 
sequence  of  best  thresholds  as  determined  by  the  border/edge 
match  score  (Table  3.10.1),  there  is  a large  shift  (from 
gray  level  27  to  17)  even  in  this  short  sequence  of  frames. 

Thus  no  single  threshold  is  appropriate  for  the  whole  sequence. 
Nonetheless,  the  previous  threshold  when  used  on  the  current 
frame  is  a fairly  good  choice.  This  suggests  the  following 
approach:  In  a single  pass  over  the  frame,  segment  the 

current  frame  using  the  best  threshold (s)  from  the  previous 
frame  and  simultaneously  compute  the  best  threshold (s)  for 
this  frame  (to  be  applied  to  the  next  frame  in  sequence) . 

The  advantage  of  this  scheme  is  that  the  frame  is  not  stored, 
thereby  realizing  a considerable  saving  in  chip  size  and 
complexity. 
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a . 


b. 


Figure  3.10.2a.  Ten  64x64  windows  from  the  sequential 

data  base. 

b.  5x5  median  filtered  windows  of  128x128 
originals,  then  sampled  2 to  1. 
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Figure  3.10.3.  Effect  of  choosing  lower  or  higher  thresholds. 

The  column  labeled  T shows  the  result  of  applying 
the  chosen  threshold  to  each  window  in  the 
sequence.  Columns  T-l,  T+l,  T+2  show  the  results 
of  using  thresholds  1 lower,  1 higher,  and  2 
higher,  respectively. 
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Threshold  Sequential  Window  # 


#1 

#2 

#3 

#4 

#5 

#6 

#7 

#8 

#9 

#10 

14 

26 

29 

34 

- 47 

35 

49 

54 

74 

68 

69 

15 

34 

29 

42 

35 

43 

48 

54 

88 

82 

79 

16 

52 

35 

48 

49 

43 

64 

72 

88 

83 

72 

17 

60 

43 

52 

51 

57 

74 

81 

84 

82 

81* 

18 

53 

51 

72 

58 

72 

72 

90 

90* 

84* 

69 

19 

54 

53 

76 

67 

70 

76* 

93* 

87 

71 

65 

20 

59 

60 

85 

75 

72 

59 

89 

66 

75 

65 

21 

67 

67 

87* 

85 

72 

59 

88 

66 

62 

22 

67 

66 

87* 

100* 

75* 

50 

80 

63 

50 

23 

67 

70 

87* 

100* 

68 

42 

80 

63 

24 

67 

72 

87* 

97 

68 

38 

80 

63 

25 

71 

70 

79 

97 

73 

33 

83 

61 

26 

76 

76 

81 

85 

69 

27 

79* 

79* 

62 

58 

68 

28 

76 

77 

64 

54 

68 

29 

75 

75 

62 

43 

52 

Table  3.10.1.  Percentage  border/edge  match  as  a function  of 
threshold  for  the  sequence  data  (maxima  indi- 
cated with  . 
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A somewhat  different  approach  attempts  to  dis- 
tribute N thresholds  across  the  threshold  range  dynamically. 
Suppose  the  threshold  range  is  x gray  levels.  It  would  take 
X/N  frames  to  investigate  each  threshold  in  the  range.  How- 
ever, as  mentioned  earlier,  X/N  is  likely  to  be  **  3 . Thus 
the  entire  gray  level  range  capable  of  harboring  targets  can 
be  sampled  every  3 frames.  At  a projected  processing  rate 
of  3 frames  per  second , the  range  would  be  sampled  once  per 
second.  A hybrid  approach  is  also  appropriate,  devoting  K of 
N thresholds  to  the  most  likely  gray  levels  and  letting  N-K 
thresholds  "rove"  over  the  rest  of  the  applicable  gray- 
scale. 
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3.10.2  Region  tracking 

The  Superslice  algorithm  builds  a forest-like 
structure  of  regions  from  each  frame.  Within  each  struc- 
ture, a sequence  of  nested  regions  which  are  roughly  similar 
in  size  (but  arising  from  different  thresholds)  constitutes 
a set  of  exemplars  of  a possible  object.  In  addition,  a 
certain  number  of  accidents  tend  to  be  present.  Regions  of 
either  type  are  called  "candidate  object  regions".  The 
frame  to  frame  tracking  process  attempts  to  discover  con- 
sistent temporal  sequences  of  candidate  object  regions  by 
selecting  one  exemplar  per  candidate  object  per  frame, 
according  to  a dynamic  programming  model  (see  [28]). 

Two  evaluation  functions,  S and  D,  are  used.  The 
static  evaluation  function  S (c ) defines  a figure  of  merit 
for  each  candidate  object  region  c.  The  Superslice  algorithm 
provides  such  a figure  of  merit  based  on  contrast  and  well- 
definedness.  The  assumption  is  that  the  best  exemplar  for 
an  object  region  is  identified  by  having  the  greatest  figure 
of  merit.  The  dynamic  evaluation  function  D(c,c')  defines 
the  similarity  of  one  candidate  object  region  (c)  to  another 
(c').  This  is  evaluated  by  considering  the  scaled  differences 
between  the  feature  vector  of  c and  that  of  c' . If  c is  a 
perfect  exemplar  then  S(c)  = 0 and  D(c,c)  = 0. 

Let  (c..;  j = 1,...,N.}  be  the  set  of  candidate  re- 
l j i 

gions  in  the  ith  frame,  i = 1,...,M.  We  define  the  dynamic 

programming  problem  as:  find  {c.^  ; i = 1,M)  such  that 

i 
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T^cMtt  ^ over  all  selection  functions,  tt. 


The  solution  is  achieved  by  the  following: 


Basis  step:  Ttc^)  = Sfc^j);  j = 1,..., 
Iterative  step:  T(c^+^j)  = 


r • • • r 


Ni 

s(ci+i,j)  + {T(cik> + D(c 


for  j = 1 


/ • • • t 


N 


i+1 


The  above  procedure  finds  the  minimum  cost  sequence 


of  candidate  object  regions.  Candidate  regions  which  are 
accidental  are  unlikely  to  persist  from  frame  to  frame;  thus 
their  D terms  are  likely  to  be  large,  thereby  increasing  the 
total  cost  of  any  sequence  which  includes  them.  Note  that 
there  will  be  many  sequences  which  are  only  slightly  more 
costly  than  the  minimum.  These  suboptimal  sequences  will  be 
based  on  other  exemplars  for  the  same  object.  The  optimal 
sequence  is  thus  optimal  for  the  particular  formulations  of 
S and  D.  Giving  more  weight  to  S and  less  to  D will  tend  to 
select  best  exemplars;  while  the  reverse  weighting  will  tend 
to  favor  frame  to  frame  consistency.  A semantic  model  can 
provide  guidance. 


In  general,  the  image  sequence  may  contain  more  than 


one  object.  The  scheme  described  above  identifies  the  "best" 
object  region  sequence.  In  order  to  extract  region  sequences 
corresponding  to  other  objects  in  the  image  sequence,  we 
must  delete  all  candidate  object  regions  accounted  for  by 
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the  optimal  sequence.  The  inherent  data  structure  specifies 
which  regions  are  exemplars  for  each  object.  By  deleting 
all  candidate  object  regions  in  each  frame  which  are  similar 
to  the  selected  region  of  the  optimal  sequence  (i.e.,  contain 
it  or  are  contained  it  it) , we  can  set  the  stage  for  another 
application  of  dynamic  programming.  This  process  is  re- 
peated until  only  very  poor  (high  cost)  sequences  are  obtained. 
Presumably  at  this  point  all  objects  have  been  accounted  for. 

Occasionally,  a deletion  step  may  leave  a particular 
frame  empty  of  candidate  object  regions . This  may  occur  for 
two  reasons:  All  objects  were  accounted  for  by  the  last 
dynamic  programming  step,  or  the  candidate  region  proposer 
failed  to  elicit  an  exemplar  for  an  actual  object.  In  the 
former  case,  the  process  will  have  terminated.  The  latter 
case  can  be  handled  by  associating  a fixed  "empty  frame" 
cost  which  is  the  price  paid  for  skipping  a frame.  Of  course, 
one  can't  know  which  case  applied.  The  conservative  approach 
is  always  to  assume  the  second  case  and  apply  the  empty  frame 
cost.  The  termination  criterion  will  then  be  based  on  a 
threshold  for  the  total  cost. 

The  problem  of  an  object  leaving  the  field  of  view 
can  be  handled  in  a different  manner  by  flagging  candidate 
object  regions  which  lie  on  the  border  of  the  image.  A 
partial  sequence  whose  last  element  is  flagged  but  which 
overall  has  low  cost  can  be  accepted  as  depicting  an  object 
which  has  moved  off  the  image. 
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The  dynamic  programming  algorithm  described  above  has 
been  implemented  and  tested  on  a sequence  of  ten  windows  of 
FLIR  data  containing  a tank  (Figure  3.10.4).  These  windows 
were  already  smoothed  by  a 3x3  median  filter  to  provide 
better  response  to  thresholding.  The  Superslice  algorithm 
extracted  a modest  number  of  candidate  object  regions. 

Figure  3.10.5  displays  these  regions  (although  for  nested 
sequences  only  the  best  static  exemplar  is  displayed) . The 
solution  to  the  dynamic  programming  problem  was  computed  and 
the  exemplars  which  correspond  to  the  solution  are  shown  in 
Figure  3.10.6.  There  are  of  course  many  suboptimal  solutions 
which  are  quite  similar  to  this  one.  Their  cost  is  not 
significantly  greater  than  the  minimal  cost.  When  the  in- 
dicated regions  were  deleted  along  with  all  other  similar 
candidates,  the  only  remaining  regions  corresponded  to  noise 
and  any  minimal  cost  path  attempting  to  span  several  frames 
was  substantially  more  costly  than  the  optimal  path  or  any 
of  its  similar  suboptimal  paths.  It  seems  reasonable  there- 
fore to  establish  thresholds  for  static  and  dynamic  cost  in 
order  to  prune  the  search  space. 
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Figure  3.10.4.  A sequence  of  10  median  filtered  windows  of  a tank. 


Figure  3.10.5.  Output  of  the  Superslice  algorithm. 


Figure  3.10.6.  Optimal  sequenced  regions  using  dynamic  programming. 
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3 . 11  Image  processing  software 


3.11.1  Mini-XAP  and  Micro-XAP 

Software  engineering  experience  has  demonstrated 
the  many  advantages  of  modularized  design  with  clearly  de- 
fined software  interfaces.  This  approach  was  taken  in  the 
design  of  a picture  processing  software  system  named  Mini-XAP. 
The  system  is  intended  for  use  as  a flexible  tool  in  general 
picture  processing  research  efforts. 

Mini-XAP  is  implemented  on  a PDP  11/45  minicomputer. 
Picture  files  can  be  stored  locally  on  disk  or  9-track  tape. 
Pictures  may  be  output  to  either  a TV  monitor  or  a precision 
CRT  film  recorder.  Medium  speed  communication  lines  to 
UNIVAC  1106  and  UNIVAC  1108  computers  provide  additional 
paths  for  picture  data  transfer  and  for  program  development 
activities . 

Mini-XAP  consists  of  a number  of  interface  levels 
which  mediate  between  the  user,  the  operating  system  and 
the  image  processing  subsystem.  Briefly,  it  includes: 

VOS,  a multi-user  virtual  operating  system 

developed  at  the  University  of  Maryland; 

DBM,  a data  base  manager  for  image  storage  on  disk 
and  tape  and  for  handling  picture  I/O; 

ULI,  the  user  language  interface  which  allows  communi- 
cation between  the  DBM  and  LISP,  the  inter- 
active high  level  programming  language; 
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The  algorithm  skeleton,  a program-like  structure  for 
common  image  transformations  in  which  certain 
arguments  and  functions  are  left  unspecified 
until  the  skeleton  is  executed; 

Application  packages,  which  offer  the  user  access  to 

predefined  high-level  image  processing  software. 

A complete  discussion  of  Mini-XAP  is  presented  in  [29] . 

Micro-XAP  is  a PDP-11/45  image  access  facility  for 
Fortran  users  under  the  UNIX  operating  system.  A more  general 
facility,  Virtual  Arrays,  has  been  developed  for  LISP  users 
under  UNIX.  Documentation  of  these  image  handling  software 
systems  can  be  found  in  [30] . 
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3.11.2  Viewmaster  - a system  for  building  image 


processing  programs 

Viewmaster  is  a generalized,  concatenative  local 
neighborhood  image  processing  coordinator.  The  system  is 
based  on  the  premise  that  much  image  processing  consists  of 
sequences  of  transformations  defined  on  local  neighborhoods 
of  each  image  point.  Within  a sequence,  the  output  image 
of  each  transformation  is  fed  into  the  next.  A conventional 
approach  to  development  of  such  programs  is  to  create  inter- 
mediate image  products  corresponding  to  the  output  of  each 
transform.  This  tends  to  load  the  I/O  channels  and  waste 
mass  storage. 

Our  approach  is  to  view  local  neighborhood  operations 
as  modules  with  similar  input/output  requirements.  Saving 
of  intermediate  images  is  avoided  by  devoting  extra  core 
storage  to  buffer  the  needed  rows  of  these  "virtual"  images. 
The  generalized  structure  of  the  processing  environment  is 
as  a set  of  cooperating  processes  with  system-like  require- 
ments for  resource  allocation,  race  condition  management, 
and  memory  utilization.  The  procedural  control  structure 
is  represented  by  a directed  acyclic  graph.  Pairs  of  adja- 
cent processes  communicate  through  data  blocks  describing 
the  shared  virtual  image.  Thus  a virtual  image  is  "produced" 
by  a single  process  and  "consumed"  by  one  or  more  processes. 
This  involves  handling  constraints  which  are  not  normally 
dealt  with  in  buffer  control  systems. 
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A number  of  benefits  derive  from  this  approach. 

. Image  processing  programs  can  be  written  without 
extensive  knowledge  of  image  I/O. 

. Common  image  operations  can  be  invoked  by  name 
(through  a preprocessor)  and  concatenated. 

. Program  modules  can  be  developed  and  tested  inde- 
pendently and  then  integrated  simply. 

. Sharing  of  program  modules  is  encouraged  by  the 
uniform  operation  environment. 

. Available  core  storage  is  allocated  and  used 
efficiently.  Mass  storage  serves  as  a back-up 
only  as  necessary. 

. File  and  directory  space  for  intermediate  image 
products  can  be  reduced  or  eliminated. 

For  a fuller  description,  see  [31]. 
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3.12  Recommended  algorithm  and  flowchart 


This  subsection  summarizes  the  various  steps  which 
make  up  the  target  detection  algorithm  described  in  Section 
3. 


Acquire  the  image  from  the  sensor 
Sample  the  image  2 to  1 

Smooth  the  resultant  image  with  a 5x5  median  filter 
Compute  the  edge  map: 

Apply  a 4x4  dif ference-of-averages  edge  detection, 
selecting  the  maximum  magnitude  response  over 
four  directions 

Thin  the  edge  response  using  local  directional 
non-maximum  suppression 
Compute  a threshld  range 
Histogram  the  image 

Identify  the  mode  and  the  maximum  significant 
gray  level 

Select  K thresholds  from  the  range 

Evenly  space  K thresholds  in  the  range 
Build  the  containment  forest  of  object  region  feature 
vectors  as  follows: 

For  each  threshold  (starting  at  the  lowest) : 
Threshold  the  image 

Extract  feature  vectors  of  connected  compon- 
ents of  above  threshold  points 
Accumulate  primitive  features  as  each  compon- 
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ent  is  colored 


Select  object  region  vectors  using  thresholds 
on  size,  edge/border  cooccurrence  and  con- 
trast 

Make  each  vector  point  to  its  predecessor  (if 
any)  in  the  containment  tree 
For  each  containment  tree  in  the  containment  forest: 

For  each  exemplar  vector: 

Classify  the  feature  vector  using  the  classi- 
fier 

If  any  feature  vector  in  the  containment  tree  has 
been  classified  as  a target  then  signal  a target 
cue  for  the  corresponding  region  on  the  display 
Select  the  most  "well-defined"  exemplar  in 
the  containment  tree  (using  the  edge/border 
match  criterion) 

Classify  the  target  region  according  to  the 
classification  of  this  best  exemplar  (tank, 
truck,  APC , unknown,  small) 

Else  classify  the  containment  tree  as  noise 

IFEND 
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4 • HARDWARE  TECHNOLOGY 


This  section  of  the  final  report  was  prepared  by  the  Westinghouse  Systems 
Development  Division  for  the  Computer  Science  Center,  University  of  Maryland. 

It  contains  a review  and  summary  of  a 21-month  effort  to  generate  and  implement 
automatic  target  cueing  algorithms,  involving  three  phases,  as  follows: 

Phase  I:  Task  and  Technology  Review  (3  months) 

Phase  II:  Algorithm  Selection  and  Test  (9  months) 

Phase  III:  Hardware  Development  (9  months) 

All  phases  of  the  program  were  completed  during  the  prescribed  period, 
including  the  construction  and  demonstration  of  an  important  recognition 
function  using  charge-coupled  devices. 

The  success  of  the  program  is  due,  in  large  part,  to  the  close  coordination 
between  members  of  the  government-university-industry  team.  The  principle 
team  members  from  the  government  were  Mr.  John  Dehne  and  Dr.  George  Jones  of 
NVL.  For  the  University  of  Maryland,  principle  members  were  Profs.  Azriel 
Rosenfelt  and  David  Milgrara.  The  Westinghouse  team  included  Dr.  Glenn 
Tisdale,  Program  Manager,  Mr.  Thomas  Willett,  Project  Engineer,  Dr.  Nathan 
BJuzer,  and  Dr.  Gerald  Borsuk. 

This  section  of  the  report  was  prepared  by  Thomas  Willett  and 
Glenn  Tisdale. 

4.1  Introduction 

We  will  begin  this  section  by  reviewing  the  nature  of  and  the  need  for 
automatic  target  cueing.  This  will  lead  to  a specification  of  system  design 
goals  in  Section  4.2.  Algorithm  design  considerations  will  be  examined 
in  Section  4.3,  including  both  system  data  processing  requirements  and 
their  implementation  in  suitable  hardware.  The  process  of  hardware  fabri- 
cation will  be  covered  in  Section  4.4.  Conclusions  and  recommendations  are 
offered  in  Section  4.5. 

Definition  of  Automatic  Target  Cueing 

Before  discussing  the  design  goals  and  hardware  implementation  for 
an  automatic  target  cueing  system,  it  is  useful  to  examine  its  character 
and  purpose. 
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As  the  tools  used  in  reconnaissance  and  target  acquisition  operations 
continue  to  improve,  it  is  becoming  increasingly  clear  that  man  himself  is  a 
performance  bottleneck.  Sensors  provide  greater  resolution,  dynamic  range,  and 
speed;  and  the  techniques  that  make  effective  use  of  sensor  information,  such 
as  communication  links  and  weapon  delivery  systems,  are  also  being  advanced. 
However,  acquisition  of  target  material  largely  depends  on  the  ability  of  a human 
operation  to  search  sensor  imagery  in  real  time.  If  he  misses  targets,  or 
identifies  them  incorrectly,  mission  performance  is  degraded. 

One  way  to  improve  this  situation  is  to  provide  the  human  with  cues  to 
locate,  and  possibly  identify,  targets  on  his  displays.  Such  cues  might 
originate  with  a priori  information,  or  from  the  detection  of  peculiar  target 
conditions  such  as  motion,  hot  spots,  or  electromagnetic  radiation.  However, 
a more  general  approach  is  to  apply  automatic  recognition  techniques  to  the 
imagery.  As  shown  in  Figure  4-1,  the  image  processor  serves  as  an  information 
filter  on  the  image,  alerting  the  human  to  the  presence  of  potential  targets, 
possibly  by  audible  signals  initially,  and  then  by  providing  visual  cues  or 
overlays  on  his  display.  In  this  process,  the  final  verification  of  target 
identity  is  reserved  for  the  human  operator.  The  image  processor  serves  to 
assist  him. 

Automatic  cueing  can  be  carried  out  either  in  airborne  or  ground  loca- 
tions. In  the  airborne  situation,  the  operator  views  a CRT-type  display  for 
acquisition  of  targets  on  a real-time  basis.  His  determination  may  result  in 
action  in  a matter  of  seconds,  either  offensive  or  defensive.  On  the  ground, 
interpretation  may  be  required  in  real-time,  or  on  a more  relaxed  basis.  In 
the  proposed  operation  of  remotely  piloted  vehicles  (RPV’s),  for  example,  a 
video  link  may  be  used  to  obtain  a CRT  presentation  at  a ground  station  of  the 
output  of  a sensor  located  on  the  vehicle.  The  problems  for  the  operator  are 
somewhat  similar  to  the  airborne  situation;  however,  his  appraisal  of  the 
sensor  image  is  entirely  limited  to  the  CRT  output.  He  can’t  look  at  the  tar- 
get area  directly.  On  the  other  hand,  he  is  not  distracted  by  problems  such 
as  vehicle  operation  and  personal  security. 

In  a more  relaxed  mode,  cueing  of  wide  area  high  resolution  reconnaissance 
imagery  might  be  used  to  greatly  reduce  the  time  required  in  scanning  imagery. 

In  this  case,  the  results  of  the  processing  is  desired  in  a matter  of  minutes, 
so  as  to  respond  to  the  motion  of  mobile  targets. 
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4.2  System  Design  Goals 

Key  considerations  in  the  design  of  an  automatic  target  cueing  system 
are  its  performance,  physical  characteristics,  and  allowable  cost.  We  will 
deal  with  each  of  these  considerations  in  this  paragraph.  However,  a quantita- 
tive determination  of  design  parameters  will  depend  on  the  manner  in  which  the 
mission  is  implemented.  Such  implementation  will  be  discussed  first. 

As  explained  in  Section  4.1,  the  target  cueing  function  might  be  performed 
aboard  a vehicle,  or  at  a ground  station  if  imagery  is  relayed  for  analysis. 

In  either  case,  the  performance  goals  will  tend  to  be  comparable.  As  regards 
physical  characteristics  and  cost,  however,  the  vehicle  location  will  demand 
much  tighter  restrictions.  Our  discussion  will  proceed  on  the  basis  of  the 
vehicular  application.  Both  helicopters  and  high-speed  aircraft  are  airborne 
candidates.  The  RPV  image,  on  the  other  hand,  will  be  analyzed  at  a ground 
station;  therefore,  the  physical  limitations  within  the  RPV  are  not  a problem. 

As  the  state  of  the  art  in  automatic  target  recognition  develops,  and  high 
levels  of  performance  are  attained,  it  is  anticipated  that  the  human  observer 
will  eventually  be  eliminated  in  some  applications.  For  example,  recognition 
equipment  might  be  placed  aboard  a missile  for  unaided  terminal  guidance. 

The  requirement  for  high  performance,  small  size  and  weight,  low  power  consump- 
tion, and  low  cost  will  all  apply  in  this  case. 

Performance  Goals 

Key  performance  parameters  are  the  detection  and  recognition  rates  for 
targets  of  interest,  the  false  alarm  rate,  and  the  speed  of  operation  of  the 
cueing  system.  Detection  occurs  when  a target  of  any  kind  is  indicated  by 
the  cueing  system,  while  recognition  occurs  when  the  correct  target  class 
is  selected  from  among  several  possible  classes.  Detection  and  recognition 
performance  is  expressed  as  a percentage  of  the  targets  which  are  actually 
available.  A false  alarm  occurs  when  a target  is  indicated  even  though  none 
is  present.  The  false  alarm  rate  is  expressed  on  the  basis  of  a unit  of 
elapsed  time  or  area  of  coverage.  The  required  speed  of  operation  is  determined 
by  the  time  available  to  the  operator  to  make  decisions,  the  search  area  to  be 
covered  by  the  sensor,  and  the  sensor  frame  rate.  It  depends  heavily  on  human 
factors  considerations,  such  as  decision  times  and  reaction  times,  and  the 
choice  of  prioritization  ground-rules. 
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In  the  discussion  in  the  first  quarterly  report,*  it  was  noted  that  the 

cueing  system  must  significantly  improve  the  search  capability  of  the  unaided 

human  interpreter  if  it  is  to  be  cost  effective.  The  problem  is  to  locate 

targets  which  occupy  a tiny  area  in  the  sensor  field  of  view,  and  are  embedded  in 

background  clutter  and  noise.  The  performance  of  the  interpreter  is  limited  by 

visual  scanning  rates,  eye  fatigue,  and  conflicting  duties.  References  were 

cited  which  tend  to  indicate  rather  low  target  acquisition  rates  (such  as 

33  percent  detection  and  25  percent  recognition)  under  realistic  conditions. 

Although  no  data  are  available  which  provide  a direct  comparison  of  operator 

performance  with  and  without  cueing,  there  is  growing  evidence  that  cueing 

2 3 

systems  could  more  than  double  this  performance.  * If  this  evidence  is 
substantiated,  then  the  outlook  for  a cost-effective  cueing  system  appears 
very  promising.  However,  a final  determination  depends  on  combining  specific 
system  and  mission  parameters. 

We  are  particularly  concerned  with  the  airborne  mission,  where  space 

and  weight  available  for  the  equipment  are  at  a premium.  Mr.  Donald  Looft, 

Deputy  Director  of  DARPA,  has  described  airborne  scenarios  for  helicopters 

4 

and  high-speed  aircraft  as  follows: 

"Now  visualize  a likely  scenario  for  the  one-man  pilot  observer. 

He  must  direct  the  sensor  and  visually  scan  his  display  for  tar- 
gets which  most  assuredly  are  concealed  by  terrain  or  background 
clutter.  He  must  fly  low  and  as  fast  as  possibly  to  minimize  ex- 
posure to  radar  controlled  weapons,  say  in  the  helicopter  case, 
speeds  of  60  to  80  knots  at  altitudes  of  100  to  200  feet  are  typi- 
cal. In  the  case  of  attack  aircraft,  speeds  of  400  to  500  knots 
at  altitudes  of  two  to  three  kms.  (6560  - 9840  feet)  are  normal 
flight  profiles,  though  the  latter  can  also  be  "down  on  the  deck". 


Algorithms  and  Hardware  Technology  for  Image  Recognition, 

Westinghouse  Quarterly  Report  on  Contract  DAAAG53-76-C-0138, 

1 May  through  31  July  1976,  page  2-2. 

2 

Tisdale,  G.E.,  A Digital  Image  Processor  for  Automatic  Target 
Cueing,  Navigation,  and  Change  Detection,  Proceedings  of  the 
SPIE  Symposium  on  Tactical  Reconnaissance,  Reston,  Virginia, 

April  20,  1977. 

3 

Algorithms  and  Hardware  Technology  for  Image  Recognition,  Quarterly 
Management  Report  by  Univ.  of  Maryland,  1 August  - 31  October,  1977, 
Par.  2.3. 

4 

Looft,  D.J.,  Image  Understanding  — A Perspective  of  POD  Needs,  banquet 
speech  presented  at  the  Sixth  Annual  AIPR  Symposium,  College  Park, 
Maryland,  June  1,  1976. 
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"At  the  same  time,  the  man  or  the  crew  is  performing  these  flight 
maneuvers  which  are  "hairy"  in  broad  daylight,  let  alone  darkness, 
he  must  monitor  his  aircraft  instruments,  he  must  be  wary  of 
terrain  obstacles,  and  he  may  well  be  under  enemy  fire.  Ideally, 
the  pilot  wants  to  acquire  and  engage  targets  on  a single  pass. 

If  he  must  take  several  passes,  his  survivability  is  greatly 
reduced.  If  he  has  been  directed  to  a probable  target  area  and  is 

able  to  take  advantage  of  terrain  masks  he  must  "pop  up",  make 

his  observations,  acquire  and  engage  targets  in  times  like  15  or  20 
seconds.  He  would  always  like  to  acquire  and  engage  enemy  targets 
at  sufficient  standoff  distance  to  be  out  of  range  of  radar 
controlled  enemy  air  defense  weapons.  This  means  he  is  usually  at 
or  near  the  threshold  recognition  range  of  the  sensor." 

A detailed  examination  of  the  trade-offs  between  the  required  cuer 
processing  rate  and  the  allowable  false  alarm  rates  was  presented  recently  by 
Dehne  et  al.  of  NVL^.  For  a set  of  mission  parameters  which  relate  primarily 
to  the  helicopter  scenario,  it  was  found  that  for  processing  rates  between  3 

and  10  frames  per  second,  from  0.5  to  1.8  false  alarms  could  be  accommodated 

per  frame.  These  results  assumed  that  20  seconds  were  available  to  cover 
a large  search  window,  resulting  in  about  0.7  seconds  to  handle  each  frame. 

This  figure  includes  the  frame  processing  time,  the  time  to  slew  the  sensor, 
the  operation  decision  time  per  false  alarm  (0.2  seconds)  and  his  reaction 
time  to  advance  to  the  next  frame  (0.2  seconds).  The  report  considers 
sequential  as  well  as  combined  processing  and  slew,  with  the  former  preferred. 

A separate  consideration  with  the  above  processing  rates  is  that 
the  cueing  symbols  superimposed  on  the  display  must  appear  in  the  correct 
location  even  after  the  processing  delay.  With  a frame  rate  of  30  per  second, 
the  delay  covers  3 to  10  frames  (0.1  to  0.3  seconds).  Misregistration  of  target 
and  symbol  could  be  caused  by  target  motion  relative  to  the  terrain,  or  by  the 
changes  in  the  field  of  view  due  to  aircraft  motion.  Broadside  motion  of  the 
target  is  generally  the  worst  case.  Suppose  the  cueing  window  subtends  twice 
the  extent  of  the  target  on  the  display  in  both  the  horizontal  and  vertical 
dimensions,  and  is  located  at  the  point  of  the  target  center  in  the  processed 
frame.  It  can  move  by  half  its  dimension  in  any  direction  and  still  be 
contained  within  the  cueing  window.  For  a vehicle  20  feet  in  length  which 
subtends  20  pixels  in  the  display, 

\)ehne,  J.S.,  Van  Atta,  P.  and  Raimondi,  P.,  Specifying  Image  Processing 
Systems  for  Thermal  Imagers,  paper  presented  at  the  Seventh  Annual 
Symposium  of  the  EIA-AIPR  Committee,  College  Park,  Md.  22-24  May  1977. 
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motion  of  10  feet  must  be  accommodated  over  a worst  case  period  of  0.3  seconds, 
with  a corresponding  allowable  broadside  speed  of  30  feet  per  second  (43  mph) . 

This  result  is  independent  of  range  if  the  window  is  proportional  to  target 
size. 

The  report  also  considers  the  use  of  a wide  sensor  field  of  view  for 
cueing,  followed  by  operator  confirmation  with  the  narrow  field  of  view.  It 
is  assumed  in  this  case  that  the  capability  of  the  cuer  for  recognition  exceeds 
that  of  the  operator  by  an  amount  sufficient  to  compensate  for  the  increased 
field  of  view.  Under  these  conditions,  one  false  alarm  per  frame  could  be 
accommodated  with  a processing  rate  of  0.54  frames  per  second  (about  a 10:1 
reduction  over  the  previous  case).  However,  at  the  present  state  of  the  art, 
this  improved  cuer  performance  relative  to  the  operator  has  not  been  demonstrated. 
In  that  regard,  it  is  noted  that  because  of  the  eye  integration  time  of  0.2 
seconds,  the  operator  gains  a signal-to-noise  improvement  over  the  cuer  of, 
perhaps,  2.5. 

A final  approach  considered  a sensor  with  an  expanded,  high  resolution  scan 
area  equivalent  to  the  target  search  window.  Due  to  display  limitations, 
the  operator  sees  either  a low  resolution  version  of  the  entire  window,  or  a 
small  segment  containing  potential  targets  as  selected  by  the  cuer.  The  cuer 
processing  rate  is  not  greatly  affected  by  this  mechanization. 

For  the  assumed  frame  size  of  375  by  500  pixels,  the  processing  rates  of 
3 to  10  frames  per  second  correspond  to  data  handling  rates  of  0.6  to  1.9 
megapixels  per  second. 

The  foregoing  discussion  was  addressed  to  the  helicopter  scenario.  For 
the  high-speed  aircraft,  the  available  search  time  will  tend  to  be  lower,  but 
the  required  search  window  will  probably  also  be  reduced.  The  reduced  search 
window  can  be  achieved  by  reliance  on  navigation  aids  for  the  acquisition  of 
predesignated  targets.  At  the  high  speeds  and  possible  low  altitudes,  it 
appears  that  the  single-seat  operator,  because  of  the  burden  of  aircraft 
navigation,  will  be  assisted  significantly  by  the  cueing  system.  Frame  rates 
which  are  comparable  to  his  reaction  time,  or  somewhat  lower,  in  the  vicinity  of 
one  per  frame  should  be  tolerable  from  the  point  of  view  of  overall  processing 
time.  However,  from  the  point  of  view  of  increased  detection  and  recognition 
rates,  as  well  as  reduced  false  alarm  rates,  it  appears  useful  to  consider  the 
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integration  of  results  from  successive  frames.  The  assignment  of  a priority 
weighting  to  target  cues  will  improve  the  probability  that  important  targets 
will  be  considered  when  a number  of  opportunities  occur. 

Physical  Characteristics 

The  significant  physical  characteristics  of  the  cueing  system  for  air- 
craft or  missile  use  include  size,  weight  and  power  consumption. 

The  increased  availability  of  general-purpose  MSI  circuits  has  made  it 
possible  to  offer  existing  cuer  algorithms,  using  conventional  packaging  tech- 
niques, in  packages  which  should  be  acceptable  for  aircraft  use.  A total  system, 
excluding  displays,  might  be  expected  to  occupy  a volume  of  0.5  to  1.0  cubic 
feet,  and  to  weigh  15  to  30  lbs.,  including  power  supply.  Power  in  the 
neighborhood  of  200  to  300  watts  will  be  required. 

For  missile  applications,  conventional  packaging  can  be  improved  upon 
by  use  of  flat  packs,  or  bare  chip  packaging,  and  by  the  introduction  of 
some  specially  designed  chips. 

One  thrust  of  the  present  Westinghouse  program,  however,  has  been  to 
determine  the  necessary  area  of  silicon  substrate  required  to  provide  cuer 
functions.  As  will  be  described  in  Section  4.4,  the  fabrication  of  special  CCD 
LSI  circuits  appears  feasible,  and  would  reduce  the  cuer  to  an  overall  chip 
area  of  a few  square  inches.  On  this  basis,  introduction  of  cueing  functions 
into  an  artillery  shell,  for  "f ire-and-target"  performance,  becomes  feasible. 
Allowable  Cost 

In  Section  2.4  of  the  first  quarterly  report  on  this  program,  an  attempt 
was  made  to  relate  the  allowable  cuer  cost  to  its  mission  value.  The  argument 
held  that  if  the  operator’s  performance  was  significantly  improved  with  the 
use  of  cueing,  then  the  value  of  cueing  becomes  a significant  fraction  of  the 
overall  cost  of  flying  a series  of  missions  (a  very  large  figure).  Unfortunately, 
the  improvements  associated  with  cueing  have  not  yet  been  measured,  and  the 
assignment  of  an  improvement  factor  will  no  doubt  be  elusive.  Furthermore, 
an  upper  cost  limit,  which  is  much  lower,  can  realistically  be  anticipated  for  any 
new  addition  to  a vehicle  complement. 
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Since  the  cueing  system  is  a digital  processor,  its  production  cost 
using  conventional  packaging  techniques  can  be  estimated  reasonably  well. 

For  helicopter  or  aircraft  use,  a figure  of  $20K  to  $50K  per  unit  is  suggested. 
Reductions  in  size  by  increased  use  of  LSI  will  tend  to  reduce  the  recurring 
cost  for  each  unit,  but  at  the  expense  of  a significant  nonrecurring  cost  for 
initial  development. 

Implementation  of  a complete  cueing  system,  using  CCD  circuits  on  a small 
number  of  silicon  chips,  takes  this  sequence  one  step  further.  The  final  unit 
recurring  cost,  in  production,  might  range  from  $1K  to  $5K,  including  test, 
but  the  development  program  would  be  a multimillion  dollar  affair.  Before 
such  an  investment  could  be  considered,  a number  of  hurdles  would  have  to  be 
overcome.  First,  the  satisfactory  operation  of  the  system  would  have  to  be 
established.  Next,  an  attempt  should  be  made  to  compare  the  performance  of 
competitive  approaches,  since  only  one  design  can  probably  be  initiated. 

Finally,  a variety  of  applications  should  be  considered,  so  that  the  develop- 
ment cost  can  be  divided  as  much  as  possible. 

A practical  approach  to  this  dilemma,  which  has  been  initiated  in  the 
present  program,  consists  of  the  selection  and  implementation  of  key  circuit 
functions  in  CCD  form.  These  circuits  can  hopefully  be  used  in  hybrid  arrange- 
ments to  reduce  the  size,  weight,  and  power  of  early  cuer  designs.  With  the 
growing  availability  of  new  chips,  these  values  will  continually  decrease. 

At  the  same  time,  the  solution  of  a variety  of  application  problems  will  be 
possible  from  the  library  of  available  chip  designs. 

4.3  Algorithms,  and  Hardware  Implementation 

This  section  describes  a preferred  set  of  algorithms  developed  by  Maryland 
which  tentatively  comprises  the  first  portion  of  a cueing  system.  A system  flow- 
chart is  shown  in  Figure  4.3-1.  A description  of  data  flow  and  storage  require- 
ments is  included. 
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Figure  4.3-1.  System  Flowchart 


In  general,  the  Median  Filter  acts  to  suppress  noise.  The  Gradient  Operator 
extracts  edges  which  are  then  thinned  by  the  Non-Maximum  Suppression  Algorithm. 
At  the  same  time  a set  of  gray  levels  is  determined  and  the  filtered  image  is 
thresholded  at  each  gray  level.  A Connected  Components  Algorithm  partitions 
each  of  the  thresholded  images  into  potential  object  regions.  The  Super  Slice 
Algorithm  correlates  perimeter  points  formed  independently  by  the  Non-Maximum 
Suppression  and  Connected  Components  Algorithms  and  a score  is  obtained  for 
each  gray  scale  threshold.  These  scores  and  several  other  algorithms  form  a 
set  of  Classification  Logic.  A short  description  of  each  algorithm  follows. 


4.3.1  Gradient  Operator 

The  Gradient  Operator  is  an  edge  detector  which  is  defined  as  GRAD  OP  = 
max  { | A-B | , | C-D | } where  A,  B,  C and  D each  represent  the  sums  of  overlapping 
regions  of  4 X 4 pixels  each,  as  seen  in  Figure  4.3-2. 
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Figure  4.3-2.  Gradient  Operator 

The  value  o\  GRAD  OP  is  placed  in  the  pixel  location  marked  Mx"  which 
pixel  to  the  left  and  above  the  center  of  the  entire  region.  The  key 
operation  in  GRAD  OP  is  the  formation  of  the  difference 

A-BsC  0 if  |A|  < |B| 

( A-B  | A | > | B | 

which  is  realized  on  a silicon  substrate  with  the  configuration  shown 
Figure  4.3-3.  D^  is  a diffusion  diode  through  which  charge  is  injected  into 
the  chip;  A and  B are  gates  whose  potentials  are  controlled  by  voltages  represent- 
ing the  sums  A and  B,  respectively.  These  gates  will  form  a trap  to  retain 
some  of  the  injected  charge.  The  trapped  charge  is  equal  to  A-B  and  is  removed 
by  the  transfer  gate. 


is  one 
arithmetic 

(4.3-1) 

in 


A:: 


B;; 


i i.±: 


Transfer 

Gate 


Figure  4.3-3.  Subtraction  Module 


4-11 


Let  us  consider  a physical  analogy  in  which  Din,  A,  B,  and  TG  are  a set  of 
steps,  the  height  of  each  step  represents  the  potential,  and  the  charge  is 
represented  as  water. 


Figure  4.3-4b.  Water  Trapped 


When  the  height  (potential)  of  step  (gate)  A is  raised  above  that  of  step  (gate) 
B,  water  (charge)  is  trapped  above  B in  an  amount  corresponding  to  A-B.  Figure  A 
4.3-4  a,  b represents  both  conditions  of  Equation  (4.3-1). 

The  algorithm  B-A  = \ 0 if  |b|  < | A | 

[B-A  if  |b|  > j A I 

requires  another  silicon  substrate  in  which  the  gate  positions  of  A and  B are 
reversed.  The  block  diagram  of  the  absolute  difference  operator  |A-B|  is  shown 
in  Figure  4.3-5. 


Figure  4.3-5.  Absolute  Value  Circuit 
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If  V < IbL  then  B *=  0,  A ® B-A  and  A + B « B-A.  Similarly,  if 
1 1 out  out  out  out  J 

} A | > |b|,  then  B *»  A-B,  A =0  and  A + B « A-B.  These  two  statements 
1 1 out  * out  out  out 

form  an  absolute  value  operator.  A similar  operation  can  be  formed  for  |c-d|, 

but  GRAD  OP  really  compares  |a-B|  and  |C-D|  and  chooses  the  maximum  since  it 

is  defined  as  max  { | A-B  J , | C-D | } . Figure  A. 3-6  shows  the  GRAD  OP  block  diagram, 
and  Figure  4.3-5  shows  the  outpu  s at  various  locations. 


Output 


Figure  4.3-6.  CCD  Gradient  Operator 
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Figure  4.3-7.  Output  of  GRAD  OP  at  Various  Locations 
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4.3.2  Median  Filter 


The  Median  Filter  acts  to  extract  the  median  value  from  a 5 x 5 pixel 
window  and  place  that  value  in  the  center  pixel  location;  the  Filter  can  be 
considered  as  a smoothing  operator.  The  Filter  quantizes  each  of  the  23 
analog  signals  into  a number  of  discrete  units  and  then  sorts  the  23 
quantized  signals  by  arranging  them  in  a descending  order  of  magnitude, 
a.  Charge  Quantizer 

The  silicon  substrate  forming  the  Quantizer  is  shown  in  Figure  4.3-8. 


Figure  4.3-8.  Charge  Quantizer.  Symbols  are  Identified  in  the  Text 

is  the  diffusion  diode  through  which  charge  is  injected  into  the  chip 

and  into  the  holding  well,  HW.  DC  blocks  the  charge  from  leaving  via  D.  . 

m 

An  amount  of  charge,  Q,  proportional  to  S,  the  signal  voltage,  is  removed 
from  HW  via  the  transfer  gate,  TG,  and  placed  in  the  signal  well,  SW. 

Via  the  blocking  gate,  BG,  and  the  thimble  well,  TW,  a discrete  amount  of 
charge,  q,  is  removed  and  placed  in  well  g^.  Another  quantum  q is  removed 
from  SW,  and  placed  in  g^  while  the  first  charge  has  been  shifted  to  g^- 
This  process  is  repeated  until  SW  is  empty  and  all  the  charge  has  been 
placed  in  a number  of  discrete  wells  g^,  g^  •••  g^. 
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Figure  4.3-9  Flow  Analogy  to  Charge  Quantizer 

proportional  to  the  signal  voltage  S .being  poured  into  a tray  of  quantized  bins. 
When  a bin  is  filled  with  water  the  water  flows  over  the  top  into  the  next  bin. 

The  volume  of  water  is  divided  between  a number  of  discrete  bins. 

b.  Sorter 

Recall  that  the  contents  of  wells  g.  , g„,  ...  6 °f  the  quantizer 

1 L n 

(Figure  4.3-8)each  contain,  at  most,  an  amount  of  charge  q.  The  contents  of 

each  well  is  parallel  shifted  into  its  corresponding  channel  of  the  sorter 

(Figure  4.3-10)  so  that  if  there  were  q charge  in  g^»  there  is  now  q charge  in 

and  so  on.  Forming  traps  as  described  in  Section  4.3.1  with  wells  Pg^>  P^* 

and  P _,the  charge  in  each  channel  is  shifted  into  the  large  holding  well  (LHW) . 

83 

The  large,  holding  well  is  partitioned  into  N channels  also.  Consider  a numerical 

example;  a sequence  of  numbers  4,  7,  5 is  quantized  at  q = 1 so  that  4,  7,  and  3 

levels  respectively  represent  each  number.  Then  Figure  4.3-11  shows  the  sequence 

as  it  goes  through  the  quantizer;  the  g.  , g_,  ...  g wells;  the  I , T , ...  I 

L Z n 1 2 n 
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Pfl! 

Pg2 

Pq3 


LHW 

BG 

TW 

TG 


channels;  and  the  large  holding  well.  It  also  shows  the  removal  sequence  from 
the  large  holding  well  and  the  remainder  at  each  stage.  The  numbers  are  removed 
in  order  of  decreasing  magnitude  7,  5,  4 which  shows  the  numbers  have  been 
sorted  by  magnitude. 


4.3.3  Non-Maximum  Suppression 

The  Gradient  Operator  extracts  edges  in  either  the  horizontal  or  vertical 
direction;  the  Non-Maximum  Suppression  Algorithm  then  looks  in  a direction 
perpendicular  to  the  edge  for  a larger  gradient.  If  a larger  value  cannot 
be  found,  the  edge  under  consideration  is  retained;  the  edge  is  removed  if 
a larger  value  is  found.  The  Algorithm  is  shown  in  Figure  4.3-12. 
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Figure  4.3-11  Sorting  Sequence 
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Figure  A. 3 12.  Arrangement  of  Comparison  for  Non-Maximum  Suppression 


The  gradient  under  consideration  is  a vertical  one  and  the  area  examined  tot 
larger  gradients  is  in  the  vertical  direction. 

Embodiment  of  the  Non-Maximum  Suppression  Algorithm  (NMS)  requires  severa 
operations  with  CCD  structures.-  A key  part  of  NMS  is  extracting  the  largest 
xm  gradient  value  in  the  neighborhood  surrounding  y;  xm  is  then  compared  to  the 
gradient  value  yg  representing  the  yth  pixel.  Sorting  the  x values  to  obtain 
can  be  accomplished  by  the  sorting  operation  described  earlier.  Comparing 

Xm  anc“  cnn  done  by  the  subtraction  module  also  described  before.  Then 
the  block  diagram  appears  in  Figure  4 3—13 


Figure  A. 13. 
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NMS  Block  Diagram 


This  time  the  subtraction  module  outputs  an  enable  signal  to  the  CCD 
shift  register  instead  of  the  actual  difference.  A blocking  gate  shown  in 
Figure  4. 3.-14  is  used  to  block  (enable)  y from  entering  the  register. 

O 


P + 


Figure  4. 3.-14  Blocking  Gate 
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4.3.4  Threshold  Determination 


The  philosophy  here  is  not  to  attempt  to  find  a single  threshold  but 
rather  to  use  a set  of  thresholds  which  span  the  range  of  gray  scale.  For  the 
NVL  FLIR  data,  fifteen  (15)  gray  scale  levels  represented  the  gray  scale 
ranges  and  selecting  a threshold  every  three  gray  scale  levels  was  deemed 
to  give  satisfactory  target  detections  by  the  University  of  Maryland. 

To  implement  this  type  of  algorithm  would  require  a sorter  (as  described 
in  section  4. 3. 1.2  which  arranges  the  gray  levels  in  descending  order 

for  the  image.  The  first  number  (the  largest)  leaving  the  sorter  corresponds 
to  the  first  threshold  and  sets  a counter  to  1.  The  second  exiting 
number  is  compared  with  the  first  and  the  counter  updates  to  2 if  the  second 
number  is  different.  If  it  is  the  same,  the  counter  remains  at  1.  In 
general,  each  number  is  compared  with  the  previous  one  to  determine  if  the 
counter  should  be  updated.  When  the  counter  gets  to  3,  the  second  threshold 
level  is  determined.  In  this  manner  every  third  gray  scale  level  is 
selected  and  used  as  a threshold.  A block  diagram  of  the  implementation  is 
shown  in  Figure  4.3-15. 

4.3.5  Connected  Components 

The  purpose  of  the  algorithm  is  to  segment  the  image  data  stream  into 
smaller  domains.  Each  small  domain  includes  a single  object  in  the  image 
plane.  This  algorithm  distinguishes  between  objects  and  isolates  regions 
so  that  statistics  for  Classification  Logic  can  be  obtained. 

Assume  that  the  original  image  has  been  thresholded  and  the  result  is  in 
binary  form  with  gray  levels  exceeding  g^  shown  as  l’s  in  Figure  4.3-16. 
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Figure  4.3-15.  Threshold  Determination 
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Binary  Image  77'0189-V-3 


Figure  4.3-16a.  Binarv  Imaae 
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Figure  4.3-16b,  Computations  for  second  Row 
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Two  image  lines  are  retained  in  memory  so  that  each  pixel  can  examine  Its 
neighbors  to  the  left  and  right  and  above  and  below.  No  diagonal  connections 
are  permitted  under  this  convention,  and  an  adjacent  (horizontal  or  vertical) 
pixel  must  be  occupied  in  order  to  make  a connection.  No  skips  or  gaps  are 
allowed,  and  the  computations  start  one  pixel  in  from  the  edge.  In  Figure 
A . 3- 16b,  there  are  four  distinct  regions.  A,  B,  C,  and  D.  The  only  possible 
connection  between  regions  B and  C is  through  a diagonal,  which  is  not  allowed 
Computations  for  the  fourth  row  are  seen  in  Figure  4.3-16c. 
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Figure  4.3-16c.  Computations  for  Fourth  Row 


Here,  there  is  a connection  between  regions  B and  C and  an  equivalence  statement 
B = C,  is  carried  along.  At  the  end  of  the  sixth  row,  there  is  another 
connection  between  C and  D (C  = D)  and  all  the  regions  are  completed  as  seen 
in  Figure  4.3-16d. 


4-23 


A 

B 

B 

B 

B 

B 

B 

A 

B 

B 

B 

B 

B 

B 

A 

B 

B 

B 

B 

B 

A 

B 

B 

B 

B 

B 

A 

B 

B 

B 

B 

B 

B 

B 

B 

B 

B 

B 

B 

77-01 89-V-6 

Figure  4.3-16d.  Completed  Image 


The  areas  of  A,  B,  C and  D are  computed  by  cumulating  the  number  of  pixels 
assigned  to  each.  The  perimeter  is  calculated  by  cumulating  the  number  of 
pixels  assigned  to  each  region  which  are  neighbors  of  zeros,  i.e.,  the 
neighbors  did  not  exceed  the  gray  level  threshold,  g^. 
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a.  Coloring  Operator 


We  assume  that  the  Connected  Components  Algorithm  processes  a binary 
image,  i.e.,  each  pixel  contains  either  a one  (1)  or  a zero  (0).  The  binary 
data  stream  will  enter  the  Coloring  Operator  and  emerge  transformed  into 
different  colors  or  signal  levels  for  different  shapes.  Read  out  of  the 
binary  picture  will  progress  one  horizontal  line  at  a time  starting  with  the 
top  line  and  progressing  downward.  Each  horizontal  line  will  be  read  out 
from  left  to  right.  Since  the  image  data  is  read  out  serially,  the  Coloring 
Operator  is  a local  operator.  The  coloring  of  each  non-zero  element  in  the 
image  plane  will  be  done  as  shown  in  Figure  4.3-17. 


Figure  4.3-17.  Data  Flow  Through  Coloring  Operator 
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The  Coloring  Operator  is  a transformation  from  a binary  picture  to  a colored 


pne  by  a mapping  T 


T (A,  B,  M) : C 


where  C is  the  color  of  the  transformed  pixel  C,  the 
represent  nearest  neighbors  of  C,  and  M represents  an 
relative  locations  of  pixels  A,  B,  and  C in  the  image 
Figure  4.3-18. 


variables  A and  B 
available  color.  The 
plane  are  shown  in 


Figure  4.3-18.  Relative  Location  of  A,  B,  C 


We  define  the  coloring  window  as  always  containing  these  three  elements. 
Elements  A and  B are  nearest  neighbors  of  C and  have  already  been  processed 
by  the  Operator.  Element  B is  located  one  horizontal  line  above  elements 
C and  A.  Element  C is  being  painted  by  the  Coloring  Operator  according  to 
the  following  rule: 


For  C / 0 


A if  A (<  0,  B / 0 
B if  A - 0,  B / 0 
H if  A - 0,  B = 0 


When  adjacent  elements  have  different  colors,  the  element  being  painted  assumes 
the  color  of  the  nearest  neighbor  in  the  same  line  (rows  dominate).  Whenever 
elements  A and  B are  zero  and  element  C is  not  zero,  element  C^  is  given  a 
new  color.  Multi-colored  object  regions  occur  when  pixels  A and  B are  both 
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non-zero,  not  equal  to  each  other,  and  C is  non-zero,  i.e.. 

For  C ^ 0,  A t B + 0 

Painting  of  the  binary  picture  with  different  colors  will  be  accomplished 
by  using  a different  number  of  charge  quanta  to  represent  various  colors. 

The  number  of  colors  required  is  a function  of  target  size,  shape,  and 
frequency  of  occurrence  within  the  image  plane.  For  purposes  of  the  immediate 
discussion,  we  shall  assume  K colors  are  required,  therefore  each  non  zero 
pixel  can  require  up  to  K memory  locations  to  represent  its  color  by  a 
thermometer  code  (see  Quantizer  Section  4. 3. 1.2). 


The  configuration  of  the  Coloring  Operator  is  shown  in  Figure  4.3-19; 
note  that  there  are  K outputs  for  a single  input  C.  Such  a construction 
is  required  to  reduce  cumraulative  errors  if  analog  implementation  of  Module  C 
is  attempted.  Moreover,  the  configuration  shown  in  Figure  4.3-19  is  readily 
amenable  to  LSI  and  CCD  technology.  Each  bit  of  C^  output  is  governed  by 
the  expression, 


C 4 If  A*  t o.  Bk  + o 

' »K1£V° 

^ 1£  ■ BK  ■ 0 


The  required  input  for  operating  the  Coloring  Operator  are  C,  a\  B^  and  M* . 
Input  A^  is  readily  obtained  by  tapping  the  first  vector  element  stored  in  the 
SI/SO  CCD  delay  line.  Similarly,  vector  element  is  also  obtained  by  tapping 
the  output  of  the  SI/SO  CCD  delay  line.  The  construction  of  this  delay  line 
is  readily  feasible  and  has  been  addressed  in  the  Data  Flow  Section. 
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Figure  4.3-19.  Details  of  the  Coloring  Operator 
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b . System  Block  Diagram 


77  0942  V 16 


Figure  4.3-20.  System  Block  Diagram 

The  System  Block  Diagram  for  the  Connected  Components  Algorithm  is  shown  in 
Figure  4.3-20.  Before  continuing  with  the  specific  hardware  implementation 
it  is  useful  to  consider  a general  description  of  the  implementation  as 
a review  and  an  indication  of  where  we  are  going  next. 
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The  purpose  of  the  Connected  Components  Algorithm  is  to  segment  an 
image  frame  into  object  regions;  these  object  regions  are  potential  shapes 
of  interest  and  features  are  extracted  from  them  for  classification  purposes. 
We  assume  that  Time  Delay  Integration  is  part  of  focal  plane  signal  processing 
which  implies  that  the  image  comes  to  the  cuer  in  the  form  of  one  line  at  a 
time,  i.e.,  the  pixels  in  one  line  arrive  in  parallel.  The  Connected 
Components  Operator  then  moves  along  this  line  of  pixels,  with  the  previous 
line  in  memory,  determining  which  pixels  are  part  of  a particular  object 
region  or  if  a new  object  region  is  starting.  If  we  are  to  extract  features 
from  each  object  region,  there  must  be  a means  for  distinguishing  between 
different  object  regions.  One  approach  to  the  problem  is  to  color  each  object 
region  with  a different  color  and  then  have  a feature  extractor  operator 
assigned  to  each  color.  Where  an  object  has  several  colors,  the  feature 
extractors  corresponding  to  those  colors  accumulate  their  features,  dump 
them  in  a scratch  feature  extractor  to  combine  them,  and  reassign  the  result 
to  one  of  the  two  feature  extractors.  Let  us  now  proceed  into  the  specific 
areas . 

c . Feature  Extractor 

There  is  a Feature  Extractor  corresponding  to  each  color  as  shown 
in  Figure  4.3-20.  The  signals  shown  in  Figure  4.3-20  enable  the  particular 
Feature  Extractors  for  computations  when  an  object  has  several  colors.  They 
also  direct  which  Feature  Extractor  will  receive  the  contents  of  the  scratch 
Feature  Extractor,  i.e.,  which  color  will  dominate.  We  have  organized 
Figure  4.3-20  so  that  the  number  of  inputs  to  the  Feature  Extractors  is 
minimized.  Each  Feature  Extractor  is  visualized  as  a many-channeled,  large 
holding  well  which  follows  along  the  lines  of  the  sorter.  (See  Section 
4.3.2) 
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Figure  4.3-20.  Organization  of  Feature  Extractors 

Each  channel  would  correspond  to  a particular  feature  and  since  the  features 
are  cumulative,  they  would  simply  add  in  the  scratch  feature  extractor.  A 
discussion  and  implementation  of  the  features  by  the  University  of  Maryland 
is  found  in  Section  3.8.  We  have  described  the  front  and  back  ends  of  the 
Connected  Components  Algorithm,  now  we  lay  out  the  middle  portion  which  is 
designed  to  handle  multi-colored  objects. 
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a.  Equivalence  Statements 


We  present  in  Figure  4,3-21  an  arbitrary  object  region  which  provides 
sufficient  characteristics  to  explain  the  remainder  of  the  Connected 
Components  Algorithm. 
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Figure  4.3-21.  Arbitary  Object  Region 

We  are  coloring  the  object  from  left  to  right  and  from  the  top  to  the  bottom.  At 
row  1,  two  seemingly  different  objects  are  identified  and  are  colored  1 and  2, 
respectively.  At  row  4,  a third  object  is  detected  which  doesn’t  have  any 
vertical  or  horizontal  connections  to  objects  1 and  2 and  so  is  colored  with 
another  color,  no.  3.  The  coloring  of  these  apparently  distinct  objects  continues 
through  row  9.  Feature  Extractors  1,  2,  and  3 are  accumulating  features  for 
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each  of  the  three  objects  and  their  respective  close  out  clocks  are  running. 


At  pixel  (10.5)  a vertical  connection  between  different  colors  (1  and  3)  is 
noted.  Since  rows  dominate,  color  1 is  placed  in  pixel  (10,5).  We  are  now 
alerted  that  Feature  Extractors  1 and  3 will  be  combined  in  the  Scratch 
Feature  Extractor  at  some  time  in  the  future  and  we  want  to  remember  this 
equivalence.  Since  RAM  structures  are,  so  far,  alien  to  the  CCD  world, 
we  shall  use  a Switching  Matrix  as  shown  in  Figure  4.3-22.  A connection  is 
formed  between  row  1 and  column  3.  Then  every  time  color  3 is  encountered 
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Figure  4,3-22.  Switching  Matrix 

in  the  object  region,  the  3 line  is  pulsed  and  the  signal  exits  the  Switching 
Matrix  on  row  color  1.  At  pixel  (10,6)  for  example,  we  paint  color  1 because 
there  is  a horizontal  connection  with  pixel  (10,5)  and  column  color  3 leaves  the 
Switching  Matrix  on  the  row  color  1 line.  At  pixel  (10,9),  an  apparently  new 
object  is  encountered  and  color  no.  4 is  introduced.  At  pixel  (10,11),  a 
vertical  connection  between  color  4 and  3 is  detected  and  color  4 dominates. 
Color  3 is  already  latched  to  color  1,  so  the  connection  establishing  color  4 
as  dominating  color  1 is  made  in  the  switching  matrix  as  shown  in  Figure  4.3-23. 
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Figure  4.3-  23.  Switching  Matrix 

Color  1 is  a dominate  color  with  respect  to  color  3 and  a Subordinate  Color 
with  respect  to  color  4.  At  pixel  (10,12),  color  4 is  painted  in  because 
of  the  horizontal  connection  with  pixel  (10,11).  At  pixel  (10,13),  color  3 
is  closed  out  as  shown  by  the  symbol  C3  in  Figure  4.3-21.  This  means  that 
an  entire  horizontal  image  has  been  traversed  since  the  last  color  3 was 
entered.  There  is  a clock  associated  with  each  Feature  Extractor  which 
keeps  track  of  the  time  since  the  last  entry  to  that  Feature  Extractor.  The 
clock  turns  over  at  the  rate  that  the  pixels  are  processed  as  the  Coloring  Operator 
moves  along  a line  of  image.  When  the  clock  equals  the  number  of  pixels  in 
a line  of  image,  the  color  is  closed  out.  To  close  out  a color,  e.g.  color 
no.  3,  Clock  no.  3 sends  a pulse  to  the  Switching  Matrix  which  identifies 
color  no.  1 as  the  other  part  of  the  equivalence.  Since  color  no.  3 is  closed 
first,  it  will  be  merged  into  color  no.  1.  This  is  accomplished  by  entering 
the  contents  of  Feature  Extractors  1 and  3 into  the  Scratch  Feature  Extractor 
to  combine  them,  and  placing  the  results  in  Feature  Extractor  1.  Then  the 
connection  between  colors  1 and  3 in  the  Switching  Matrix  is  opened  and  color  3 
is  ready  to  be  used  again.  The  remainder  of  row  10  is  straight  forward. 


4-34 


At  row  11,  reference  to  the  Switching  Matrix  places  color  4 in  pixel 
(11,1).  Also  at  pixel  (11,7),  color  1 is  closed  out,  the  result  placed  in 
Feature  Extractor  no.  4,  and  the  connection  between  colors  1 and  4 in  the 
Switching  Matrix  is  opened.  Color  4 continues  to  be  painted  through  row  11, 
including  pixel  (11,16)  where  row  colors  dominate  column  colors.  And  a connection 
between  colors  4 and  2 is  established  in  the  Switching  Matrix.  Color  4 is 
painted  at  pixel  (11,21)  because  of  this  connection.  At  pixel  (11,23)  color  2 
is  closed  out  and  the  results  placed  in  the  Feature  Extractor  of  color  4.  The 
remainder  of  the  object  region  is  straight  forward.  The  Switching  Matrix, 
connections  (•)  and  disconnections  , and  the  equivalence  statements 

are  shown  in  Figure  4.3-24. 
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Figure  4.3-24. 
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e.  Switching  Matrix 


The  Switching  Matrix  is  shown  in  Figure  4.3-25. 
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Figure  4.3-25.  Switching  Matrix 
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Each  color  data  element,  analog  signal  occupying  one  CCD  site,  is 
quantized  into  a K element  vector.  Only  the  highest  subscript  vector  element 
is  retained  for  each  color.  The  highest  element  retained  enables  one  vertical 
line  of  the  K lines  connected  to  the  Color  Line  Decode  Module.  Each  vertical 
line  from  the  CLDM  is  connected  via  a switch  to  only  one  horizontal  line  going 
to  the  K Feature  Extractors.  Normally  the  K**1  CLDM  line  is  connected  to  the 
horizontal  classifier  line. 

Quantizing  the  analog  color  signal  requires  a Quantizer  described 
previously  in  regard  to  the  Median  Filter.  Enabling  the  particular  vertical 
line  to  the  CLDM  is  not  difficult  since  the  largest  non  zero  element  is 
followed  by  a zero  element  unless  it  is  the  K*1*1  element,  in  which  the  K-l 
and  elements  are  both  ones.  Figure  4.3-26a  shows  an  enabling  circuit 

for  the  vertical  CLDM  lines. 
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Figure  4.3-26a.  Enable  Circuit 
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The  color  element  corresponding  to  the  vertical  CLDM  line  is  fed  directly 

to  the  AND  gate.  The  complement  of  the  next  higher  element  is  also  fed  to  the 

til 

AND  gate;  this  gives  the  non-zero,  zero  combination.  For  the  K line,  the 

K-l  element  is  fed  uncomplemented  to  the  AND  gate.  This  is  the  normal 

mode  of  operation  for  the  Switching  Matrix;  next  we  consider  the  case  when 

portions  of  the  same  target  have  different  colors. 

We  will  show  how  the  outputs  from  vector  A“^  and  modules  can  modify 

the  diagonal  interconnection  between  the  CLDM  and  the  K Feature  Extractors. 

A switch  is  located  at  each  intersection  between  the  horizontal  and  vertical 

lines  in  Figure  4.3-25.  The  function  of  this  switch  is  to  repaint  each 

target  into  a single  color  and  this  switch  will  be  reset  when  the  color 

has  been  closed  out.  For  example,  if  vector  A^  has  color  i,  1 _<  i _<  K,  and 

vector  has  color  j , i _<  j <_  K,  i^j,  only  the  i,  j latch  switch  is  enabled. 

Once  this  happens,  all  the  data  elements  to  be  painted  with  the  j*'*1  color 

th 

will  be  painted  with  the  i color.  The  i and  j elements  will  be  channeled 
to  a single  Feature  Extractor  i,  also.  This  color  merge  operation  is 
achieved  by  connecting  the  i and  j outputs  from  the  CLDM  unit  via  switches 
to  the  i^1  Classifier. 

To  a large  extent,  then,  merging  different  colors  within  the  same  target 
can  be  achieved  by  selecting  appropriate  switches  and  connecting  them  at  the 
intersection  between  the  vertical  and  horizontal  lines  of  Figure  4.3-25. 

A diagram  of  such  a switch  is  shown  in  Figure  4.3-26b.  Two  types  of  switches 
are  required  for  the  Target  Sorter;  switches  on  the  main  diagonal  will  be 
different  from  those  placed  off  the  main  diagonal.  From  the  switch  configu- 
ration shown,  only  one  switch  will  be  on  in  any  given  column.  Therefore,  at 
most  only  one  Feature  Extractor  line  will  be  connected  to  each  vertical  line. 
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Figure  4.3-26b.  Switches  within  the  Target  Sorter 

The  latch  shown  in  Figure  4.3-26b  is  reset  such  that  initially  only  the  main 
diagonal  switches  can  be  enabled.  A reset  input  to  each  i,  j latch  switch 
will  be  controlled  by  the  outputs  from  the  vector  A"*"  and  B*  modules  shown  in 
Figure  4.3-25.  Each  module  will  only  contain  one  element  with  the  highest 
subscript  like  the  vector  in  the  CLDM  module.  Existence  of  non-zero  elements 
in  the  vectors  A and  modules  will  only  occur  when  the  Coloring  Operator 
detects  two  adjacent  colors  during  the  target  painting.  That  is,  when 
A*  ^ B1  ^ 0 a cont*it:ion  which  represents  multi-color  in  one  target.  A flag 
for  such  a condition  is  provided  by  the  CO  and  will  enable  the  A1  and  B1 
modules  for  one  clock  period.  The  set  inputs  to  the  latch  switches  are 
gated  such  that  only  one  latch  is  enabled  for  any  combination  of  vectors  A 
and  B. 
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til  til 

Once  the  j color  has  been  closed  out,  it  will  reset  the  j line  such 

th  th 

that  all  latching  switches  connected  to  the  j line  (excluding  the  j 
CLDM  switch)  will  be  open.  Moreover,  the  j ^ color  will  be  returned  to 
the  Color  Storage  and  Memory  Module  located  in  the  Color  Operator  Unit. 

4.3.6  Super  Slice 

Suppose  an  image  has  been  processed  through  the  Non  Maximum  Suppression 
and  Connected  Components  Algorithms  independently  as  seen  in  Figure  4.3-27. 


A 

H 

B 

B 

R 

A 

B 

n 

B 

B 

A 

U 

B 

B 

B 

B 

A 

B 

B 

B 

B 

A 

B 

B 

B 

B 

e 

B 

B 

B 

B 

B 

B 

n 

• 

77  0189  v / 


Figure  4.3-27a.  Non  Maximum 
Suppression  Output 


Figure  4.3-27b.  Connected 
Component  Output 


Region  A has  a perfect  score:  every  pixel  of  maximum  gradient  is  matched  as  a 
perimeter  point  of  A.  Region  B matched  23  points  out  of  26  possible  gradient 
points  but  it  also  produced  5 perimeter  points  which  were  not  matched  by 
gradient  pixels.  This  match  is  done,  in  parallel,  for  each  of  the  threshold 
levels. 
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4.3.7  Feature  Extraction 

A tentative  list  of  features  which  Maryland  will  use  for  classification 
includes  perimeter  match,  area,  perimeter  extent,  average  gray  level,  and 
maximum  height  and  width.  The  purpose  of  this  section  is  to  didcuss  the 
implementation  of  a feature  extractor. 

At  this  point  in  the  data  flow,  we  may  assume  that  the  object  has  been 
segmented  and  the  existence  of  more  than  one  color  within  the  object  has  been 
resolved.  Also  the  object  in  its  entirety  has  been  sent  to  one  of  the  K 
Feature  Extractors  discussed  previously.  Maryland  has  produced  a tentative 
set  of  features  thus  far  so  the  hardware  effort  will  be  limited,  at  present, 
to  those  features.  We  shall  continue  with  the  assumption  that  the  image 
is  entering  the  Feature  Extractors  one  horizontal  line  at  a line,  the 
processing  is  moving  down  the  image,  and  from  right  to  left, 
a.  Area 

Assume  an  object  shape  such  as  that  shown  in  Figure  4.3-28. 


Figure  4.3-28.  Arbitrary  Target 

The  area  of  the  object  can  be  obtained  simply  by  summing  the  number  of  pixels 
within  the  target;  in  this  case  the  area  is  48  pixels.  The  Area  Feature 
Extractor  then  is  seen  in  Figure  4.3-29. 
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Figure  4,3-29.  Area  Feature  Extractor 

b.  Perimeter 

Both  the  Perimeter  Match  and  Extent  may  be  considered  together.  Each 
pixel  has  eight  neighbors;  a perimeter  point  is  defined  as  having  at  least 
one  zero  neighbor,  similarly  an  interior  point  is  defined  as  having  eight 
non-zero  neighbors.  Recall  that  these  definitions  apply  to  the  binary  image 
which  was  developed  prior  to  the  Connected  Components  Algorithm.  However, 
in  the  Classifier  Module  no  exterior  neighbors  would  be  allowed  to  enter,  so 
the  test  of  a perimeter  pixel  may  be  conducted  as  the  complement  of  an  interior 
target  pixel.  That  is,  only  target  pixels  enter  the  Classifier;  these  target 
pixels  which  are  not  interior  target  points  must  be  perimeter  target  pixels. 

The  Perimeter  Feature  Extractor  would  consist  of  three  lines  of  memory, 
each  525  pixels  long  corresponding  to  the  image  frame  width.  The  center 
pixel  would  be  the  location  of  the  pixel  being  examined  for  neighbors. 
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An  Interesting  problem  is  horizotally  aligning  these  three  lines  of  memory 
within  the  Classifier  in  the  same  way  they  are  aligned  in  the  image.  This 
can  be  accomplished  by  using  three  lines  of  memory  forming  a serpentine  and 
injecting  a bit  into  the  delay  lines  for  the  appropriate  color  location.  Non 
destructive  readouts  form  the  3 x 3 moving  neighborhood  as  the  pixels  shift 
through  as  shown  in  Figure  4.3-30. 
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Figure  4. 3- 30a.  Image 
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Figure  4.3-30b.  Serpentine  Implementation 
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The  second  to  last  pixel  in  the  second  row,  X^9  is  always  the  pixel  position 
being  tested  as  a perimeter  point.  This  implementation  does  not  have  to  make 
special  allowances  for  pixels  located  on  the  edges  of  a target.  This  is  the 
same  technique  described  in  the  second  and  third  quarterly  reports  for  forming 
the  moving  windows  for  Gradient  Operator,  Median  Filter,  and  Non  Maximum 
Suppression.  We  can  AND  the  neighbors  of  X^,  complement  the  output  and  AND  it 
with  X^.  If  the  output  of  the  first  AND  is  zero,  then  one  of  X^’s  neighbors 
is  zero  and  X^  is  a perimeter  point.  See  Figure  4.3-31  for  the  logic.  This 
is  fed  to  a summer  and  the  total  number  of  perimeter  points  for  a target 
is  produced.  For  the  Perimeter  Match,  the  important  thing  is  to  clock  the 
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Figure  A. 3-31.  Perimeter  Point  Logic 

image  of  edges  at  the  same  rate  that  the  perimeter  extent  of  the  same  target 

is  determined.  Then  is  compared  with  its  geometric  counterpart  in  the 

edge  domain,  i.e.,  at  the  same  x,  y position  in  the  image.  Both  are  fed 

A- A A 


to  an  AND  gate  and  if  non-zero,  will  increment  another  summer  which  keeps 
track  of  the  Perimeter  Match  Score. 

c . Maximum  Target  Height  and  Width 

This  has  to  do  with  keeping  track  of  the  maximum  target  excursions  in 
x and  y.  For  the  maximum  x extent  suppose  we  use  the  same  three  delay  line 
memory  which  was  used  for  Perimeter  Match.  We  further  assume  a more 
complicated  object  as  shown  in  Figure  4.3-32.  The  left  most  site  of  the  second 
row  and  the  complement  of  the  right  most  site  of  the  last  row  are  ANDed. 

If  a one  comes  from  the  AND  gate  the  summer  is  incremented  by  one.  Essentially, 
we  count  the  number  of  pixels  in  the  first  row  of  the  image  and  then  determine 
the  amount  of  extension  in  the  following  rows  and  add  it  to  the  first  sum. 

Note  that  number  indicated  by  the  summer  corresponds  to  a count  of  the  number 
image  columns  in  the  first  rows. 

The  y extent  of  the  image  can  be  determined  by  counting  the  number  of 
pixels  passing  through  the  three  line  delay  and  dividing  the  number  of  pixels 
per  line. 

d.  Average  Gray  Level 

This  involves  clocking  the  Median  Filtered  image  at  the  same  time  the 
color  elements  are  clocked  through  the  Feature  Extractor.  The  gray  level 
sum  is  obtained  and  divided  by  the  area  obtained  earlier  by  the  Feature 
Extractor.  The  quotient  yields  the  average  gray  level. 

4.3.8  Data  Flow 

The  Median  Filter,  Gradient  and  Non-Maximum  Suppression  Operators  are 
calculated  for  small  windows  which  move  over  the  entire  frame.  These 
windows  are  formed  by  parallel  shifting  one  line  of  image  from  the  TDL  array 
into  a parallel  in,  Serial  out  Shift  register  (Figure  4.3-33a).  This  register 
and  others  then  forms  a serpentine  delay  through  which  the  pixels  are  shifted. 
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Figure  4.3-32a.  Object 


Figure  4.3-32c.  Tapping  Second  Image  Row 
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Figure  4.3-32d.  Tapping  Third  Image  Row 
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Figure  A.3-33a.  Focal  Plane  Arrangement 
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Figure  4.3-33b.  Details  of  Serpentine  Delay  Line 
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Non-destructive  readouts  form  the  regions  comprising  the  appropriate 
window  (Figure  4.3-33b). 

It  appears  that  the  computation  speed  of  the  Median  Filter  and  Gradient 
Operator  is  conservatively  estimated  at  50-100  kHZ,  hence  a parallel 
organization  of  the  focal  plane  (see  Figure  4.3-34)  is  necessary  for  a 1 
megapixel/sec.  data  rate.  Suppose  we  divide  the  PI/SO  register  immediately 
below  the  focal  plane  into  20  vertical  sections,  each  approximately  34  pixels 
wide,  and  each  with  its  own  serpentine  CCD  delay  line  as  seen  in  Figure  4.3-35. 
If  the  image  is  640  pixels  wide,  we  divide  the  register  into  20  sections  of 
approximately  34  pixels  each  to  avoid  problems  associated  in  calculating 
medians  and  gradients  along  the  edge  of  an  image.  Each  vertical  section  is 
eight  stages  long  to  accommodate  the  Gradient  Operator,  which  requires  eight 
lines  of  storage,  and  34  pixels  wide  for  a total  number  of  272  shifts  at 
50  kHZ.  This  appears  to  avoid  numerical  integrity  problems.  Performing 
the  three  algorithms  consecutively  requires  three  separate  moving  windows 
and  clocks  to  control  the  non  destructive  readouts  for  each.  The  Median 
Filter  Algorithm  requires  5 stages  of  delay,  the  Gradient  Operator  requires 
8 lines  (stages),  and  the  Non  Maximum  Suppression  Algorithm  requires  7 lines. 
The  series  arrangement  is  shown  in  Figure  4.3-35.  The  outputs  from  Non 
Maximum  Suppression  feed  the  SuperSlice  Algorithm.  Referring  to  the  System 
Flow  Chart  (Figure  A. 3-1),  the  other  path  requires  that  the  original  image 
be  thresholded  at  g^,  g9  ...  g^  values.  The  computation  of  g^,  g^, •••gn 
will  require  some  knowledge  of  all  the  gray  level  values  within  a frame, 
hence  frame  storage  can  be  expected,  or  thresholds  determined  from  the 
two  prior  frames  will  be  used  for  the  present  frame.  In  any  event,  having 
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Figure  4.3-34.  PI/SO  and  Serpentine  in  Parallel 
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Figure  4.3-35.  Focal  Plane  Data  Flow 
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obtained  g^,  g^,  ...  g^,  the  entire  frame  can  be  thresholded,  by  examining 

the  rows  in  parallel,  and  Connected  Component  analysis  started,  as  in  Figure  4.3-36. 

The  frame  is  clocked  out  of  storage,  one  pixel  at  a time,  and  each  pixel  is 

thresholded  in  parallel  forming  frames  Fg^,  Fg^,  ...  Fg^.  Similarly,  the 

Connected  Component  Algorithm  is  performed  in  parallel  with  the  required 

one  line  delay  for  each.  To  avoid  additional  storage,  the  SuperSlice  Algorithm 

will  be  done  as  the  output  from  Connected  Components  becomes  available  and 

the  scores  cumulated. 

4.3.9  Storage  Requirements 

From  Figs.  4.3-34  and  4.3-35,  it  appears  that  20  stages  of  serpentine  delay 

are  necessary,  each  stage  is  approximately  34  pixels  long  and  divided  into  20 
separate  sections  for  a total  of  400  delay  lines.  To  achieve  a 1 megapixel/ 
sec.  speed,  we  are  conservatively  talking  about  20  Median  Filter  Processors, 

20  Gradient  Operator  Processors,  and  20  Non-Maximum  Suppression  Operators 
for  a total  of  60.  Assuming  five  thresholds,  as  an  example,  an  additional  12 
lines  of  delay  are  required,  plus  five  Connected  Component  Processors. 

4.4  Hardware  Fabrication 


4.4.1  Algorithms 

In  previous  work,  we  have  described  the  hardware  implementation  of  a 
number  of  algorithms.  In  this  section  we  shall  take  that  work  a step  further 
and  consider  the  fabrication  of  these  implementations.  Specifically,  we  shall 
consider  chip  size,  cryogenic  problems,  speeds,  yields,  and  power  consumption 
relevant  to  the  Gradient  Operator,  the  Median  Filter,  Serpentine  Memory, 
Non-Maximum  Suppression,  and  Connected  Components  Elements. 
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Figure  4.3-36.  Connected  Components  Data  Flow 
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a.  Gradient  Operator 


A major  assumption  in  the  analysis  is  that  the  inputs  for  the  Gradient 
Operator  (i.e.,  A,  B,  C,  and  D)  will  be  obtained  from  a separate  IC  chip  which 
will  be  a part  of  the  serpentine  memory  module.  The  structure  of  this  module 
will  be  addressed  in  another  section. 

The  size  of  the  Gradient  Operator  chip  will  be  deduced  by  assigning 
real  estate  to  each  operation  performed  by  the  Operator.  A key  operation  is 
the  absolute  subtraction  module  (ASM)  which  obtains  the  absolute  difference 
between  two  inputs  and  yields  a charge  representing  that  quantity.  Each 
difference  CCD  structure  will  nominally  require  a channel  1.2  mils  wide;  four 
input  channels  are  needed  to  provide  four  charge  packets,  two  representing 
| A^  - B^|  and  two  representing!  C^  - | . The  length  of  each  ASM  will  be 

4 mils,  a size  sufficient  to  provide  a readout  structure  necessary  to  drive 
the  second  stage  of  the  Operator.  The  second  stage  selects  which  output 
| A^  - B^,  | or  | C^  - | is  the  largest  gradient  of  the  i th  pixel  location. 

Combining  the  real  estate  requirement  for  the  first  and  second  stages,  we 
calculate  a chip  size  of  8 mils  x 10  mils. 
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We  assume  a four  phase  gate  construction;  a smaller  number  of  phases 
(which  requires  less  chip  area)  could  be  used;  however, speed-charge  handling 
capacity  and  ease  of  fabrication  favors  four  phase  construction. 

The  structure  advocated  is  exclusively  based  on  MOS  FET  and  CCD 
technology.  Both  MOS  FET  and  CCD  structures  exhibit  improved  performance 
at  cryogenic  temperatures  greater  than  30°K.  At  very  low  cryogenic  temperatures 
(>_  30°K) , the  performance  of  MOS  CCD  structures  begins  to  show  significant 
degradation.  Relative  to  room  temperature  performance, 
experiments  have  shown  that  with  cryogenic  temperatures  we  should  obtain 
higher  operational  speeds  and  lower  noise  figures.  The  improved  performance 
is  attributed  to  increases  in  mobility  resulting  from  lower  levels  of  phonon 
scattering  of  the  signal  carriers. 

The  fabrication  yield  depends  on  chip  size  and  the  number  of  steps. 

The  process  is  very  similar  to  that  for  making  surface  channel  CCD  and  we 
estimate  six  photolithographic  masks.  Cognizant  of  these  similarities,  we 
expect  a yield  of  better  than  50%.  The  variables  which  will  influence  the 
final  chip  configuration  will  be  speed,  charge  handling  requirements  and 
resolution  (uniformity) . Our  present  design  is  conservatively  aimed  at  a 
speed  of  100  kHZ.  Higher  speeds  are  possible,  but  Increasing  the  operating 
speed  from  100  kHZ  to  1 MHZ  will  require  special  and  more  difficult  structures. 

Power  consumption  of  the  Gradient  Operator  will  depend  on  the  operating 
frequency  and  checking  voltages.  Conventionally,  the  power  consumed  by  a CCD 
type  structure  is  expressed  as 

P - CV2  f N 
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where  N is  the  number  of  gates,  C is  the  capacitance  of  each  gate,  V is  the 
clocking  voltage  and  f is  the  operating  frequency.  Computing  the  power 
requirements  we  obtain  less  than  10  milliwatts.  This  level  of  power  consumption 
is  exclusive  of  the  power  requirements  of  the  clocking  circuitry  required  to 
operate  the  Gradient  Operator. 

b.  Median  Filter 

In  the  second  quarterly  report,  Maryland  reported  on  the  significance 
of  the  Median  Filter  Operator  and  Westinghouse  described  an  embodiment.  In 
this  section,  we  shall  consider  aspects  pertinent  to  fabrication. 

The  MFO  chip  as  considered  below  will  not  include  peripheral  clocking 
circuits  or  a structure  for  summing  the  output  from  the  serpentine  CCD  delay. 

We  assume  an  MFO  operating  on  25  pixels  located  within  a moving  window; 
provisions  for  obtaining  the  25  pixels  will  be  built  into  the  CCD  serpentine 
delay  structure  in  the  form  of  non-destructive  readouts.  Each  data  element 
(pixel)  is  assumed  to  have  a dynamic  range  equivalent  to  a 32  level  grey 
scale. 


The  size  of  the  chip  is  determined  primarily  by  the  number  of  pixels 
and  grey  levels.  The  proposed  MFO  is  required  to  operate  as  a moving  window 
device  which  requires  a CCD  memory  capable  of  storing  and  shifting  25  data 
elements  each  of  which  is  quantized  within  a 32  level  grey  scale.  A bank 
of  CCD  memory  registers  with  25  x 32  storage  locations  can  be  achieved  by 
a 6A  mil  by  6A  mil  module.  Included  in  this  estimate  are  areas  for 
incorporating  output  and  input  structures  to  the  CCD  memory. 
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Another  major  block  of  the  MFO  is  the  sorting  module  in  which  the  data 
elements  are  arranged  according  to  size.  This  requires  a bank  of  32  CCD  shift 
registers  which  are  25  elements  long  and  each  row  is  capable  of  being  inde- 
pendently shifted  left  or  right.  An  area  100  mils  wide  by  64  mils  long  is 
sufficient. 

Finally,  the  area  required  for  controlling  the  clocks  operating  the 
sorting  module  is  estimated  to  be  100  mils  by  2 mils. 

Summing  the  different  component  areas  comprising  the  MFO,  we  arrive  at 
an  area  estimate  of  100  mils  by  128  mils. 

All  the  elements  used  in  modelling  the  MFO  are  based  on  field  effect 
phenomena,  hence  we  expect  improved  performance  at  cryogenic  temperatures  in 
accordance  with  experimental  observations.  As  described  in  the  Gradient 
Operator  section,  we  expect  improvement  in  bandwidth  and  noise  reduction. 

The  size  of  the  MFO  (128  x 100  mils)  represents  a large  scale  integration 
device  and  significant  complexity  will  be  encountered  during  fabrication  and 
test.  We  calculate  that  eight  mask  levels  will  be  required.  The  yield, 
dependent  on  chip  size  and  the  number  of  masking  steps,  is  estimated  to  be  5%. 

Required  power  will  be  larger  than  that  consumed  by  a conventional  CCD 
device;  the  demand  for  more  power  comes  from  the  active  logic  devices  used 
for  clock  control  (shift  left  or  right)  functions.  Generally,  power  consumed 
depends  on* operating  speed,  component  cost,  and  system  layout.  Assuming  an 
operating  speed  of  100  kHZ,  the  MFO  will  require  less  than  100  miliwatts  of 
power . 
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c. 


Serpentine  Delay 


A large  number  of  delay  elements  are  required  for  focal  plane  processing; 
the  elements  must  have  large  memory  capacity,  good  transfer  efficiency,  and 
non  destructive  readout  structures. 

The  serpentine  deployment  shown  in  Figure  1-6  requires  a large  number  of 
transfers  which  causes  degradation  in  the  modulation  transfer  function  (MTF) . 
This  negative  effect  can  be  reduced  by  segmenting  the  image  into  several 
columns,  each  of  which  will  be  processed  in  parallel  with  the  other  columns. 
Such  a segmentation  not  only  reduces  the  number  of  CCD  transfers  per  delay 
element  but  reduces  the  operating  speed.  We  postulated  an  algorithm  operating 
speed  of  100  kHZ  based  on  hardware  implementations.  This  led  to  a division 
of  the  IR  image  into  10  columns,  each  68  pixels  wide.  From  Figure  4.3-36,  it 
is  seen  that  Median  Filter  requires  five  (5)  lines  of  delay,  Gradient  Operator 
requires  eight  (8),.  and  Non  Maximum  Suppression  requires  seven  (7)  for  a total 
of  20,  since  the  algorithms  operate  sequentially.  The  total  number  of  shifts 
is  1360  per  column.  At  a clock  frequency  of  100  kHZ,  numerical  degradation 
in  the  order  of  20%  will  occur, which  is  probably  too  high.  The  MTF  can  be 
reduced  in  several  ways. 

The  modulation  transfer  function  is  a function  of  the  input  signal 
frequency,  the  frequency  of  the  shifts  (clock  frequency),  the  number  of  shifts, 
and  the  transfer  efficiency.  The  more  practical  avenues  of  reduction  are 
clock  frequency  and  the  number  of  shifts;  we  can  double  the  number  of  operators 
to  20  each,  and  halve  the  clock  frequency  and  number  of  shifts  to  50  kHZ 
and  680,  respectively . This  may  produce  an  improvement  to  10%  degradation , 
but  this  number  would  have  to  be  confirmed  experimentally.  Of  course  this 
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approach  Increases  the  total  chip  area  which  is  still  small  and  the  external 
clocking  circuitry.  Operating  at  cryogenic  temperatures  will  probably  increase 
the  transfer  efficiency  somewhat.  Further,  the  input  frequency  can  be  band- 
limited  to  decrease  the  MTF.  Using  the  20  column  segmentation,  each  34  pixels 
wide,  a total  of  680  shifts  are  required  to  perform  the  Median  Filter,  Gradient 
Operator,  and  Non  Maximum  Suppression  Algorithms  at  a 1 megapixel/sec.  rate. 

Moreover,  surface  channel  CCD's  are  suitable  for  this  task  within  the 
defined  operating  parameters,  and  the  advantage  of  these  devices  is  realizing 
the  non  destructive  taps.  These  taps  are  necessary  in  extracting  the  ap- 
propriate pixels  for  the  moving  windows  discussed  in  Section  1.3  and  the 
second  quarterly  report. 

The  size  required  for  achieving  a memory  680  elements  long  is  1000 
square  mils  if  four  phase  clocking  is  employed.  Hence  for  20  columns  we 
will  require  a silicon  area  1000  mils  long  by  20  mils  wide. 

Operation  of  the  memory  at  cryogenic  temperatures  will  present  no 
problems  since  its  construction  is  similar  to  the  other  focal  plane  signal 
processing  components.  Considering  the  size  of  this  memory  chip  we  expect 
a yield  of  about  5%.  The  clocking  circuits  required  for  the  memory  module 
operation  are  not  included  in  the  area  calculations. 

General  Observations 

In  this  report  we  have  considered  the  physical  parameters  of  the  focal 
plane  signal  processing  circuit  elements.  In  our  opinion,  no  signal  processing 
operation  defined  and  discussed  contains  any  inherent  characteristics  which 
will  prevent  fabrication.  However,  the  number  and  size  of  the  required  IC 
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modules  is  considerable.  Integration  of  all  the  aforementioned  elements  in 
a single  large  integrated  circuit  is  a high  risk  effort.  Development  of 
each  single  IC  block  first  should  provide  sufficient  test  vehicles  and  data 
needed  to  evaluate  each  signal  processing  component.  Such  data  should  be 
obtained  before  any  large  scale  integration  of  all  the  focal  plane  signal 
processing  is  undertaken.  This  approach  will  result  in  the  most  efficient 
method  leading  towards  LSI  focal  plane  signal  processing. 

d.  Non-Maximum  Suppression  Operator 

Estimating  the  physical  size  of  the  NMSO  will  be  consistent  with  the 
design  rules  followed  for  estimating  the  fabrication  of  primitive  operators 
reported  in  previous  quarterly  reports.  It  should  be  restated  that  the 
estimates  provided  are  preliminary  and  adjustments  are  expected  when  chip 
layouts  are  made.  Estimation  will  be  made  by  first  decomposing  the  operator 
into  its  components  and  calculating  the  physical  dimensions  of  each  port. 

The  NMSO  is  made  up  of  several  components.  A serial  input/parallel 
output  CCD  structure  will  be  used  in  quantizing  the  analog  signals  injected 
into  the  NMSO.  The  size  of  this  component  will  be  5 mils  x 100  mils  wide. 
The  output  from  the  quantizing  module  will  go  into  a sortei  vJiieh  will  be 
13  mils  by  100  mils  in  size.  The  output  from  the  sortei  will  be  injected 
into  a CCD  compare  register  which  will  compare  two  signals.  One  signal  is 
the  largest,  x,  of  the  neighboring  gradient  pixels;  the  other  is  gradient 
pixel  of  interest,  y,  (see  Figure  1-2  in  the  Jan.  31,  1977  report).  The 
difference  output  will  control  the  NMSO  output  according  to  the  following 
rule 
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0 y £ x 

NMSO  Output  = 

y y > X 

Totaling  the  area  required  for  each  NMSO  component,  we  obtain  an  area  of  100 
mils  by  25  mils.  This  chip  will  have  MOS  structures.  We  estimate  a 25 
milliwatt  power  drain  by  the  NMSO  neglecting  peripheral  clocking  and  logic. 

A 10%  yield  for  this  chip  is  predicted. 

e . Connected  Components  Algorithm 

The  Coloring  Operator  consists  of  several  blocks;  the  coloring  logic, 
the  color  delay  line,  the  equivalence  operator,  and  the  color  bank.  Fabrication 
estimates  have  been  discussed  and  obtained  for  these  elements  in  the  fourth 
quarterly  report  based  on  ten  (10)  colors.  We  now  extrapolate  those  estimates 
to  handle  one  hundred  (100)  colors. 

Our  previous  estimate  for  the  coloring  logic  was  10  mils  by  50  mils. 

Laying  these  modules  side  by  side  and  end  to  end,  yields  an  area  of  100  mils 
by  50  mils  neglecting  space  for  connections.  The  logic  will  be  MOS  structure 
to  maintain  compatibility  with  other  structures.  A color  delay  line  will  be 
20  elements  wide  by  one  horizontal  TV  line  long.  Consistent  with  the  area 
required  for  previously  described  serpentine  memory  chips,  we  predict  a 400 
mil  x 300  mil  size.  The  color  bank  will  be  very  similar  to  the  Median  Filter 
already  described.  Accounting  for  the  number  of  quantization  levels,  we  obtain 
a 250  mil  x 250  mil  chip.  It  should  be  emphasized  that  the  structure  of  the 
color  bank  chip  will  be  facilitated  by  the  development  work  on  the  Median 
Filter. 

The  switching  matrix  in  the  Equivalence  Operator  is  estimated  to  require 
less  than  a 10  mil  x 10  mil  area  for  each  switching  mode  leading  to  a total 
size  of  1000  mils  x 1000  mils.  The  peripheral  logic  for  controlling  the  matrix 
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will  be  two  modules  1500  mils  x 1500  mils  in  size.  The  total  area  of  the 
Connected  Component  algorithm  is  obtained  by  laying  the  above  elements  side 
by  side  yielding  dimensions,  thus  far,  of  2500  mils  x 2500  mils.  There  are 
100  Feature  Extractors  with  a unit  size  of  12.5  mils  x 12.5  mils  yielding  a 
area  of  1250  mils  x 1250  mils.  Including  the  Feature  Extractors,  the  total 
area  of  the  Connected  Components  Algorithm  is  approximately  3750  mils  x 3750 
mils  or  3 3/A  inches  x 3 3/4  inches. 

The  complexity  of  the  Connected  Components  Algorithm  and  its  size  is 
not  insignificant.  Development  should  proceed  in  steps  such  that  individual 
components  are  fabricated  separately.  A hybrid  configuration  can  be  realized 
by  interconnecting  all  the  component  chips.  With  experimental  evaluation  and 
testing,  progress  towards  a consolidated  version  can  be  realized.  The  power 
and  yield  for  the  operation  and  fabrication  of  the  Connected  Components 
Algorithm  is  difficult  to  estimate.  An  alternate  approach  is  to  estimate 
these  parameters  for  the  individual  components  which  are  comparable  in  size 
and  complexity  to  the  primitive  operators  already  discussed.  Accordingly 
we  feel  that  the  power  consumption  and  yield  for  each  component  will  be  about 
100  - 300  milliwatts  and  approximately  10%  respectively. 
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4.4.2  Focal  Plane  Area 


This  section  presents  a preliminary  estimate  of  the  focal  plane  area 
occupied  by  the  portions  of  the  cueing  system  developed  thus  far,  i.e.,  the 
Image  has  been  smoothed  (Median  Filter),  edges  obtained  (Gradient  Operator), 
edge  width  reduced  (Non-Maximum  Suppression),  the  image  has  been  segmented  into 
targets  (Connected  Components),  and  features  extracted  (area,  height,  width, 
perimeter  extent,  average  gray  level).  The  estimate  is  preliminary  in  the 
sense  that  none  of  the  clocking  circuitry  has  been  included  in  area  estimates 
for  the  operators.  The  reason  is  that  methods  by  which  the  algorithms  will 
handle  edges  of  the  image  frame  has  not  been  specified.  The  estimate  does 
not  include  the  Classifiers. 

Assuming  that  the  focal  plane  is  divided  into  20  columns.  Table  4-1 
shows  the  number  of  processors  required  for  a system  data  rate  of  1 megapixel/ 
sec.  It  also  shows  the  geometric  area  required  for  each  processor  and  an 
estimate  of  the  area  as  defined  above.  The  area  thus  far  is  1 inches  x 
7%  inches.  The  volume  equivalent  is  3 inches  x 3 inches  x 6 inches. 
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4.4.3  Chip  Development 


The  Smart  Sensor  Project  is  scheduled  to  last  for  21  months  with  a key 
circuit  selected  at  the  one  year  mark  and  constructed  in  the  last  nine  months. 
We  wanted  to  select  a circuit  which  was  common  to  as  many  algorithms  as 
possible. 

a.  Development  Review 

Figure  4.4-1  shows  the  algorithms  developed  by  Maryland 
and  the  functions  which  are  required  by  each  algorithm.  A perusal  shows  that 
the  sorter  function  occurs  in  four  out  of  the  five  algorithms  and  is  the  one 
we  selected. 


ALGORITHM 


FUNCTION 


Gradient  Operator 
Median  Filter 

Non  Maximum  Suppression 

Connected  Components 

Histogram 


Absolute  Difference  Comparison 

Quantizer 

Sorter 

Quantizer 

Sorter 

Absolute  Difference 

Quantizer 

Sorter 

Coloring  Operator 

Quantizer 

Sorter 


Figure  4.4-1.  Algorithm  Implementation  by  CCD  Function 

Several  versions  of  the  sorter  were  put  in  production  runs  at  the 
Westinghouse  Advanced  Technology  Laboratory.  Both  version  assume  that  the 
analog  signal  has  already  been  quantized  into  a thermometer  code.  A physical 
analog  is  shown  in  Figure  4.4-2  where  a container  is  filled  with  an  amount  of 
water  (charge),  proportional  to  the  signal  voltage  S^  and  then  the  contents 
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are  poured  into  a tray  of  quantized  bins.  When  a bln  Is  filled  with  water, 
the  water  flows  over  the  top  into  the  next  bin.  The  volume  of  water  is  divided 
between  a number  of  discrete  bins. 


77-0545  V 16 

Figure  4.4-2.  Flow  Analogy  to  Quantization 

One  version  of  the  sorter  takes  the  charges,  q,  residing  in  each  bin 
and  receives  them  in  parallel  as  in  Figure  4.4-3. 


Figure  4.4-3.  CCD  Sorter 

Thus,  there  is  q amount  of  charge  shifted  from  bin  b^  to  channel  1^,  q amount 

of  charge  shifted  from  bin  b ^ to  channel  ^ and  so  on.  By  means  of  gate  pg^, 

pg2»  and  pg^»  the  contents  pf  channels  1^  through  1^  are  shifted  in  parallel 

into  the  large  holding  well  LHW.  The  large  holding  well  is  partitioned  into 
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N channels  also.  Consider  a numerical  example;  a sequence  of  numbers  4,  7,5 

is  quantized  at  q = 1 so  that  4,  7,  and  5 bins  respectively  represent  each 

number.  Then  Figure  4.4-4  shows  the  sequence  as  it  goes  through  the  quantizer, 

the  b- , b0,  ....  b bins,  the  I- , I0,  ....  I channels  and  the  large  holding 
1 2 n 1 L n 

well.  It  also  shows  the  removal  sequence  from  the  large  holding  well  and  the 
remainder  at  each  stage.  The  numbers  are  removed  in  order  of  decreasing 
magnitude  7,  5,  4 which  shows  the  numbers  have  been  sorted  by  magnitude. 

The  other  version  of  the  sorter  is  shown  in  Figure  4.4-5.  Here  the  sorter 
consists  of  individually  controlled  CCD  shift  registers.  Each  register  is 

enabled  by  a clock  pulse  and  the  presence  of  a charge  quantum,  q.  The  lower 
portion  of  Figure  4.4-5  shows  the  random  sequence  3,  1,  5,  2 of  numbers  entering 

the  sorter.  At  stage  2,  the  first  register  will  shift  only.  At  stage  3,  all 
five  registers  will  shift  and  note  that  the  data  is  then  arranged  in  decending 
order . 


We  has  been  using  surface  channel  CCD’s  to  produce  image  primitives  such 
as  produced  by  the  Median  Filter  and  Non-Maximum  Suppression  Operators  because 
of  wider  dynamic  range  and  lower  data  rate  requirements.  However  we  found  that 
the  Connected  Components  Algorithm  could  also  be  implemented  with  buried  channel 
CCD  devices  producing  a smaller  package  so  we  amended  the  chip  feasibility 
program  to  include  buried  channel  devices  also.  The  second  version  of  the 
sorter  was  done  with  surface  channel  CCD's  and  the  first  version  was  in  buried 
channel.  The  buried  channel  device  was  achieved  by  ion  implantation  in  the 
surface  channel  structure.  Probe  tests  showed  that  the  yields  were  not  high 
enough  to  continue  processing  the  buried  channel  wafers,  so  they  were  dropped. 
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Figure  4.4-5.  CCD  Sorter  Fabrication 


The  surface  channel  devices  will  be  used  in  the  demonstration.  We  could  not 
have  met  the  time  schedule  if  we  had  started  the  buried  channel  devices  from  the 
ground  up  as  was  done  successfully  on  other  programs.  For  the  purposes  of 
this  project,  amending  the  lower  risk  surface  channel  masks  seemed  to  be  a 
reasonable  way  to  produce  both  surface  and  buried  channel  sorters. 


Figure  4.4-6  shows  a wafer  of  the  buried  channel  devices.  One  portion 

of  the  demonstration  unit  is  shown  in  Figure  4.4-7,  with  the  shift  registers 
mounted  in  place.  The  ten  shift  registers  are  seen  at  the  top  of  the  unit 
and  ten  thumbwheel  switches  are  shown  below.  These  thumbwheels  represent  the 
unsorted  numbers  which  the  sorter  must  rearrange  in  descending  order.  The 
observer  may  dial  in  any  arrangement  of  numbers  which  he  wishes.  The  outputs 
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Figure  4.4-6.  Buried  Channel  Wafer 


Figure  4.4-7.  Demonstration  Unit 

and  inputs,  i.e.,  the  unsorted  and  sorted  arrangements  are  shown  on  a two-trace 
oscilloscope. 

Westinghouse  IR&D  accounted  for  70  cents  of  every  dollar  spent  on  the 
Smart  Sensor  Project. 
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b.  Feasibility  Demonstration 

The  demonstration  unit  and  a two-trace  oscilloscope  were  exhibited  at  the  DARPA 
Image  Understanding  Symposium  in  September  of  1977  at  Stanford  University  in  Palo  Alto, 
California.  The  symposium  participants  were  encouraged  to  dial  in  their  own 
set  of  random  numbers  on  the  thumbwheel  switches  and  observe  the  ordered 
outputs.  Figure  4.4a,  b,  c,  and  d are  typical  outputs  of  the  two-trace 

oscilloscope  seen  at  the  Symposium.  The  random  sequence  is  shown  in  the  left 
half  of  the  trace  and  the  ordered  sequence  in  the  right  half . 


Figure  4.4a.  Sorter  Display 

Figure  4.4a,  is  a sequence  of  6,  0,  8,  3,  4,  5,  5,  2,  7,  3 arranged  into  an 
ordered  sequence  of  0,  2,  3,  3,  4,  5,  5,  6,  7,  8.  Figure  4.4b  is  a sequence 

of  4,  4,  4,  4,  4,  5,  4,  4,  4,  4 arranged  in  an  ordered  sequence  of  4,  4,  4, 

4,  4,  4,  4,  4,  5.  Figure  4.4c  is  a sequence  of  4,  7,  4,  5,  4,  1,  4,  4,  3,  4 

arranged  in  an  ordered  sequence  of  1,  3,  4,  4,  4,  4,  4,  4,  5,  7.  Figure  4.4d 

is  a sequence  of  4,  4,  4,  4,  4,  3,  4,  4,  4,  4 arranged  in  an  ordered  sequence 

of  3,  4,  4,  4,  4,  4,  4,  4,  4,  4.  The  unit  was  also  demonstrated  at  the  Night 

Vision  Laboratory,  Ft.  Belvoir,  Va.  on  November  28,  1977. 
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Figure  4.4b.  Sorter  Display 


Figure  4.4c.  Sorter  Display 
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Figure  4.4d.  Sorter  Display 
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4 . 5 Conclusions  and  Recommendations 


This  report  has  shown  that  the  Smart  Sensor  can  be  implemented 
with  CCD  technology  and  in  a smaller  package  than  implementation  with 
digital  techniques.  However,  it  is  also  evident  that  the  CCD  package 
package  cannot  be  accomodated  on  the  focal  plane. 

The  total  area  estimate  is  11  1/4  inches  by  7 1/2  inches  . If 
3 inch  by  3inch  modules  were  employed  with  1/2  inch  centers,  the  volume 
dimensions  would  be  3 inches  by  3 inches  by  6 inches.  This  volume 
has  positive  implications  for  missiles  and  other  airborne  platforms,  in 
general. 

The  next  step  in  proving  feasibility  involves  building  some  of  the 
modules  and  checking  them  for  numerical  integrity,  size,  speed,  and  power 
consumption.  These  modules,  e.g.  Median  Filter  would  be  hybrid  packages 
in  the  first  build,  and  the  clocking  circuitry  should  be  included  on  the 
chip.  Another  item  of  particular  interest  is  the  Connected  Components 
Algorithm  with  the  switching  matrix  and  peripheral  control  logic.  Estimates 
of  ultimate  size  for  the  monolithic  elements  would  be  necessary  as  well  as 
estimates  on  the  grouping  of  algorithms  within  the  monolthic  packages. 


4-73 


I 


5.  Publications 


In  accordance  with  contract  provisions,  the  University  of 
Maryland  produced  seven  Quarterly  Technical  Management  Reports, 
a First  Quarter  Report,  and  three  Semiannual  Reports.  West- 
inghouse,  as  subcontractor,  prepared  six  Quarterly  Technical 
Progress  Reports.  In  addition,  the  following  sixteen 
Computer  Science  Technical  Reports,  twelve  Image  Under- 

standing Workshop  papers  and  five  other  proceedings  papers 
were  produced  with  the  support  of  this  contract, 
a)  Computer  Science  Technical  Report  Series 
Durga  P.  Panda,  "A  Method  of  Adaptive  Smoothing  and  Edge 
Enhancement",  TR-504,  February  1977. 

Durga  P.  Panda,  "Segmentation  of  FLIR  Images  by  Pixel 
Classification",  TR-507,  February  1977. 

David  L.  Milgram,  "Region  Tracking  Using  Dynamic  Programming", 
TR- 539 , May  1977. 

David  L.  Milgram,  "Constructing  Trees  for  Region  Description", 
TR-541 , May  1977. 

Durga  P.  Panda,  "Statistical  Analysis  of  Some  Edge  Operators", 
TR-558 , July  1977. 

Durga  P.  Panda,  "Statistical  Properties  of  Thresholded  Images", 
TR-559 , July  1977. 

Robert  A.  Hummel,  "Edge  Detection  Using  Basis  Functions", 

TR-569 , August  1977. 

Yasuo  Nakagawa  and  Azriel  Rosenfeld,  "A  Note  on  the  Use  of  Local 
MIN  and  MAX  Operations  in  Digital  Picture  Processing", 

TR-590 , October,  1977. 


5-1 


Shmuel  Peleg, "Iterative  Histogram  Modification,  2",  TR-606, 
November  1977. 

Charles  R.  Dyer  and  Azriel  Rosenfeld,  "Thinning  Algorithms 
for  Grayscale  Pictures",  TR-610,  November  1977. 

Kenneth  C.  Hayes,  Jr.,  Martin  Herman,  and  Russell  Smith, 

"PDP-11  Image  Processing  Software",  TR-612,  December  1977. 

Gilbert  B.  Shaw,  "Local  and  Regional  Edge  Detectors:  Some 
Comparisons",  TR-614,  December  1977. 

David  L.  Milgram  and  Martin  Herman,  "Clustering  Edge  Values  for 
Threshold  Selection",  TR-617,  December  1977. 

David  L.  Milgram  and  Daryl  J.  Kahl,  "Recursive  Region 
Extraction",  TR-620,  December  1977. 

Yasuo  Nakagawa  and  Azriel  Rosenfeld,  "Some  Experiments  in 
Variable  Thrsholding" , TR-626,  January  1978. 

Martin  Herman,  "A  System  for  Control  Structure  Implementation 
for  Image  Understanding",  TR-646,  March  1978. 

b)  DARPA  Image  Understanding  Workshops:  Proceedings 

1 . April  20,  1977.  Minneapolis,  Minnesota. 

David  L.  Milgram,  "Region  Extraction  Using  Convergent 
Evidence",  pp.  58-64 

Durga  P.  Panda,  "Segmentation  of  FLIR  Images  by  Pixel 
Classification",  pp.  65-70 

Thomas  Willett  and  Nathan  Bluzer,  "Automatic  Target  Cueing  on 
the  Focal  Plane",  pp.  87-88 

Azriel  Rosenfeld,  "Algorithms  and  Hardware  Technology  for 

Image  Recognition  — Project  Status  Report  - March,  1977," 
pp.  98-100. 


5-2 


2 . October  20/21,  1977.  Palo  Alto,  California 


Azriel  Rosenfeld,  "Relaxation  Methods  - Recent  Developments", 
pp.  28-30. 

T.  Schutt,  G.  Borsuk  and  T.  J.  Willett,  "A  CCD  Histogram- 
Sorter:  Feasibility  Chip",  pp.  7-8. 

David  L.  Milgram,  "Progress  Report  on  Segmentation  Using 
Convergent  Evidence",  pp.  104-108. 

David  L.  Milgram,  Azriel  Rosenfeld,  "Algorithms  and  Hardware 
Technology  for  Image  Recognition,  Project  Status  Report- 
September  1977",  pp.  139-140. 

3 . May  3-4,  1978.  Cambridge,  Massachusetts 

T.  J.  Willett,  G.  E.  Tisdale,  "Hardware  Implementation  of  a 
Smart  Sensor:  A Review",  pp.  1-8. 

Azriel  Rosenfeld,  "Some  Recent  Results  Using  Relaxation-Like 
Processes",  pp.  100-104. 

David  L.  Milgram,  "Results  in  FLIR  Target  Detection  and 
Classification"  , pp.  118-124. 

Azriel  Rosenfeld,  David  L.  Milgram,  "Algorithms  and  Hardware 
Technology  for  Image  Recognition",  p.  142. 

c)  Other  Publications 

Thomas  Willett,  Nathan  Bluzer,  "Automatic  Target  Cueing  on 
the  Focal  Plane",  Proceedings  of  the  Seventh  Annual 
Symposium  on  Automatic  Imagery  Pattern  Recognition, 

May  23-24,  1977,  College  Park,  MD,  pp.  8-9. 


5-3 


David  L.  Milgram,  "Automatic  Object  Detection  in  FLIR  Images", 
Proceedings  of  the  International  Conference  on  Cybernetics 
and  Society,  September  19-21,  1977,  Washington,  D.C., 
pp.  617-621. 

Durga  P.  Panda,  "A  Method  of  Adaptive  Smoothing  and  Edge 

Enhancement",  Proceedings  of  the  International  Conference 
on  Cybernetics  and  Society,  September  19-21,  1977, 
Washington,  D.C.,  pp.  648-656. 

Charles  R.  Dyer,  David  L.  Milgram,  "VIEWMASTER  - A System  for 
Building  Image  Processing  Programs",  Proceedings  of  the 
Eighth  Annual  Symposium  on  Automatic  Imagery  Pattern 
Recognition,  April  3-4,  1978,  Gaithersburg,  MD,  pp.  170- 
179. 


5-4 


6.  Distribution  List 


NIGHT  VISION  LABORATORY 
DISTRIBUTION  LIST  FOR  TECHNICAL  REPORTS 

* 1 Copy  Unless  otherwise  specified 


Defense  Documentation  Center 

ATTN:  DDC-TCA 

Cameron  Station  (Bldg  5) 

Alexandria,  VA  22314 

*12  cys  unlimited  distribution 
2 cys  limited  or  classified 

Director 

National  Security  Agency 
ATTN:  TDL 

Fort  George  G.  Meade,  MD  20755 

Office  of  Naval  Research 
Code  427 

Arlington,  VA  22217 
Director 

Naval  Research  Laboratory 
ATTN:  Code  2627 

Washington,  DC  20375 

Commander 

Naval  Electronics  Laboratory  Center 

ATTN : Library 

San  Diego,  CA  92152 

Commander 

US  Naval  Ordnance  Laboratory 

ATTN:  Technical  Library 

White  Oak,  Silver  Spring,  MD  20910 

Commandant,  Marine  Corps 
HQ,  US  Marine  Corps 
ATTN : LMC 

Washington,  DC  20380 

HQ,  US  Marine  Corps 
ATTN : Code  INTS 

Washington,  DC  20380 


Command,  Control  & Communications  Div 
Development  Center 
Marine  Corps  Dev.  & Edu  Cmd 
Quantico,  VA  22134 

HQ,  ESD  (XRRI) 

L.  G.  Hanscom  Field 
Bedford,  MA  01730 

Air  Force  Avionics  Laboratory 
ATTN:  AFAL/TSR,  STINFO 

Wrignt-Patterson  AFB  OH  45433  *2 

AFSPCOMMCEN/SUR 

San  Antonio,  TX  78243 

Armament  Development  & Test  Center 
ATTN : DLOSL , Tech  Library 

Eglin  Air  Force  Base,  FL  32542 

HQ  DA  (DACE-CMS) 

Washington,  DC  20310 

OSASS-RD 

Washington,  DC  20310 
Commander 

US  Army  Training  & Doctrine  Command 

ATTN : ATCD-SI 

Fort  Monroe,  VA  23651 

Commander 

US  Army  Training  & Doctrine  Command 

ATTN:  ATCDOCI 

Fort  Monroe,  VA  23651 

Commander 

US  Development  & Readiness  Command 
ATTN : DRCMA-EE 

5001  Eisenhower  Ave 
Alexandria,  VA  22333 


6-1 


Commander  Commander 

US  Army  Development  & Readiness  Command  Frankford  Arsenal 
ATTN:  DRCRD-FW  ATTN:  (Dr.  Wm.  McNeill)  PDS 

5001  Eisenhower  Ave.  Philadelphia,  PA  19137 

Alexandria,  VA  22333 


Commander 

US  Army  Training  & Doctrine  Command 

ATTN:  ATCD-F 

Fort  Monroe,  VA  23651 

Commander 

US  Army  Missile  Command 

ATTN:  DRSMI-RR,  Dr.  J.P.  Hallowes 

Redstone  Arsenal,  AL  35809 

Commander 

US  Army  Armament  Command 
ATTN : DRSAR-RDP (Library ) 

Rock  Island,  IL  61201 

Commander 

US  Army  Combined  Arms  Combat 
Developments  Activity 
ATTN:  ATCAIC-IE 

Fort  Leavenworth,  KS  66027  *3 

Commander 

US  Army  Logistics  Center 
ATTN : ATCL-MA 

Fort  Lee,  VA  23801 

Commandant 

US  Army  Ordnance  School 
ATTN:  ATSOR-CTD 

Aberdeen  Proving  Ground,  MD  21005 
Commander 

US  Army  Intelligence  School 
ATTN:  ATSIT-CTD 

Fort  Sill,  OK  73503 

Commandant 

US  Army  Engineer  School 
ATTN:  ATSE-CTD-DT-TL 

Fort  Belvoir,  VA  22060 

Commander 
Picatinny  Arsenal 
ATTN:  SARPA-TS-S  #59 

Dover,  NJ  07801 


Commander 

USASA  Test  & Evaluation  Center 
Fort  Huachuca,  AZ  85613 

US  Army  Research  Office  - Durham 
ATTN:  CRDARD-IP 

Box  CM,  Duke  Station 
Durham,  NC  27706 

Commander 
HQ  MASSTER 

Technical  Information  Center 
ATTN:  Mrs.  Ruth  Reynolds 

Fort  Hood,  TX  76544 

USA  Security  Agency 
ATTN : IARD 

Arlington  Hall  Station 
Arlington,  VA  22212 

Commander 

US  Army  Tank  Automotive  Command 
ATTN:  DRSTA-RW-L 

Warren,  MI  48090 

Commandant 

US  Army  Air  Defense  School 
ATTN:  C&S  Dept,  Msl  Sci  Div 

Fort  Bliss,  TX  79906 

Commander 

US  Army  Combined  Arms  Combat 
Developments  Activity 
ATTN : ATCACC 

Fort  Leavenworth,  KS  66027 
Commander 

US  Army  Yuma  Proving  Ground 
ATTN:  STEYP-MTD  (Tech  Library) 

Yuma,  AZ  85364  *2 

Commander 

US  Army  Arctic  Test  Center 

ATTN : STEAC-PL 

APO  Seattle,  WA  98733 

f 


6-2 


Commander 

US  Army  Tropic  Test  Center 
ATTN:  STETC-MO-A  (Tech  Library) 

Drawer  942 

Ft.  Clayton,  Canal  Zone  09827 
Commander 

US  Army  Logistics  Center 
ATTNi  ATCL-MC 
Ft.  Lee,  VA  22801 

Directorate  of  Combat  Developments 
» US  Army  Armor  School 
ATTN:  ATSB-CD-AA 

Ft.  Knox,  KY  40121 

Commandant 

US  Army  Inst  for  Military  Assistance 
ATTN:  ATSU-CTD-OMS 

Ft.  Bragg,  NC  28307 

Commander 

US  Army  Missile  Command 
ATTN:  DRSMI-RE  (Mr.  Pittman) 

Redstone  Arsenal,  AL  35809 

Commander 

US  Army  Systems  Analysis  Agency 
ATTN:  DRXSY-T  (Mr.  A.  Reid) 

Aberdeen  Proving  Ground,  MD  21005 

Commandant 

US  Army  Signal  School 
ATTN : ATSN-CTD-MS 

Ft.  Gordon,  GA  30905 

Commander 

US  Army  Tank  Automotive  Command 
ATTN:  DRSTA-RHP  (Dr.  J.  Parks) 

Warren,  MI  48090 

* NASA  Scientific  & Tech  Info  Facility 
ATTN:  Acquisitions  Br.  (S-AK/DL) 

PO  Box  33 

, College  Park,  MD  20740  *2 

Advisory  Group  on  Electron  Devices 
201  Varick  St,  9th  Floor 
New  York,  NY  10014 

Ballistic  Missile  Radiation  Anal  Ctr 
Env.  Research  Insti.  of  Michigan 
Box  618 

Ann  Arbor,  MI  48107 


Chief 

Ofc  of  Missile  Electronic  Warfare 

Electronic  Warfare  Lab,  ECOM 

White  Sands  Millile  Range,  NM  88002  *2 

Chief 

Intel  Materiel  Div  & Support  Ofc 
Electronic  Warfare  Lab,  ECOM 
Ft.  Meade,  MD  20755 

Commander 

UA  Army  Electronics  Command 

ATTN:  DRSEL-MS-TI 

Fort  Monmouth,  NJ  07703 

TACTEC 

Battelle  Memorial  Institute 
505  King  Avenue 
Columbus,  OH  43201 

Commander 

US  Army  Electronics  Command 

ATTN:  DRSEL-PL-ST 

Fort  Monmouth,  NJ  07703 

Study  Center 

National  Maritime  Research  Center 
ATTN : Rayma  Feldman 

King's  Point,  NY  11024 

IR  Information  & Analysis  Ctr 

PO  Box  618 

ATTN : Library 

Ann  Arbor,  MI  78107 

Advanced  Research  Projects  Agency 
ATTN:  IPTO  (Ma j . Carlstrom) 

1400  Wilson  Blvd. 

Arlington,  VA  22209  *4 

Commanding  General 

US  Army  Development  & Readiness  Command 
Attn:  DRXSP  (B.  Chasnov) 

5001  Eisenhower  Ave 
Alexandria,  VA  22333 

Commanding  Officer 

Naval  Underwater  Center 

ATTN:  Code  6003  (H.  Whitehouse) 

San  Diego,  CA 


6-3 


Commanding  Officer  M*-*  Stein  Grinaker 

US  Air  Force  Avionics  Lab  Norwegian  Defense  Research 

Bldg  620,  ATTN:  AFAL/AAT  (Dr.  N.  Griswold)  Establishment 

Wriqht- Patterson  AFB,  Ohio  45433  Division  for  Weapons  and  Equipment 

PBox  25 

US  Army  Research  Office  N-2007  Kjeller,  Norway 

ATTN:  Dr.  J.  Suttle 

Box  CM,  Duke  Station 
Durham,  NC  27706 

Director 

Night  Vision  Laboratory 
ATTN:  DRSEL-NV-VI  (Dehne) 

Ft.  Bel voir , VA  22060  *27/1 


Director 

Night  Vision  Laboratory 
ATTN:  DRSEL-NV-FIR  (Kelso) 

Ft.  Belvoir , VA  22060  *15 

Director 

Night  Vision  Laboratory 
ATTN:  DRSEL-NV-D  (Fulton) 

Ft.  Belvoir,  VA  22060  *12 


Dr.  Ian  Henderson 

Defense  Research  Establishment  Ottawa 
National  Defense  H.Q. 

Ottawa,  Canada 

Dr.  Lewis  Minor 
U.S.  Army  Missile  Research  and 
Development  Command 
DRDMI-TEO 
Redstone  Arsenal 
Huntsville,  Alabama  35809 


Dr.  Claude  Roche 

Establissement  Technique  Central 
de  L'Armement 

16  bis.  Avenue  Prieur  de  la 
Cote  d'Or. 

94114  Arcueil,  France 

Mr . Lars  Brock-Nannestad 

Danish  Defense  Research  Establishment 

Osterbrogades  Kaserne 

DK-2100  Copenhagen,  O,  Denmark 

Dr.  Manfred  Bohner 
FIM/FGAN 

Breslauer  Strasse  48 

7500  Karlsruhe  1 

Federal  Republic  of  Germany 

Mr.  Fons  M.  Navarro 
Physics  Laboratory  TNO 
Oude  Waalsdorperweg  63 
The  Hague,  Netherlands 


6-4 


UNCLASSIFIED 


SECURITY  CLASSIFICATION  OF  THIS  PAGE  (Wi.n  D.(.  Entered) 


REPORT  DOCUMENTATION  PAGE 

READ  INSTRUCTIONS 

BEFORE  COMPLETING  FORM 

1*  report  number 

2.  GOVT  ACCESSION  NO. 

3.  RECIPIENT’S  CATALOG  NUMBER 

4.  TITLE  end  Submit) 

ALGORITHMS  AND  HARDWARE  TECHNOLOGY 

5.  TYPE  OF  REPORT  & PERIOD  COVERED 

Final 

1 May  1976-31  Jan.  1978 

FOR  IMAGE  RECOGNITION 

6.  PERFORMING  ORG.  REPORT  NUMBER 

7.  AUTHOR'#) 

David  L.  Milgram  Thomas  J.  Willett 

Azriel  Rosenfeld  Glenn  E.  Tisdale 

(Univ.  of  Md. ) (Westinghouse  Corp.) 

B.  CONTRACT  OR  GRANT  NUMBERf.) 

DAAG53-76-C-0138 

7 

• 

a performing  organization  name  and  aooress 

Computer  Science  Center 

University  of  Maryland 

College  Park,  MD  20742 

10.  PROGRAM  ELEMENT.  PROJECT.  TASK 

AREA  A WORK  UNIT  NUMBERS 

11.  CONTROLLING  OFFICE  NAME  AND  ADDRESS 

U.  S.  Armv  Niaht  Vision  Labnrafnrv 

12.  REPORT  DATE 

31  March  1978 

Fort  Belvoir,  VA  22060 

13.  NUMBER  OF  PAGES 

291 

U.  MONITORING  AGENCY  NAME  a ADDRESSfJf  different  from  Controlling  Office ) 

15.  SECURITY  CLASS,  (ol  1 hit  report. 

Unclassified 

15#  DECLASSIFI  CATION /DOWN  GRADING 
SCHEDULE 

16.  DISTRIBUTION  STATEMENT  (of  thle  Report) 

Approved  for  public  release;  distribution  unlimited. 

17.  DISTRIBUTION  STATEMENT  (of  the  abstract  entered  In  Block  20,  It  different  from  Report) 

* 

18.  supplementary  NOTES 

\ 

I9r  KEY  WORDS  (Continue  on  reverse  side  If  necessary  and  Identify  by  block  number) 

Image  understanding 

Pattern  recognition 

Target  detection 

FLIR  imagery 

20.  ABSTRACT  (Continue  on  reverse  side  It  necessary  and  Identify  by  block  number) 

Algorithms  for  detecting  and  classifying  targets  on  forward- 
looking  infrared  (FLIR)  imagery  have  been  developed.  The  imple- 
mentability  of  these  algorithms  in  charge-coupled  device  (CCD) 
technology  has  been  investigated,  and  one  basic  function,  sorting, 
has  been  successfully  implemented.  The  recommended  algorithms 
had  a detection  rate  above  95%  and  a false  alarm  rate  of  1 to  2 
per  frame. 

DD  I j?NM7J  1473  EDITION  OF  > NOV  «S  IS  OBSOLETE  UNCLASSIFIED 


SECURITY  CLASSIFICATION  OF  THIS  PACE  (When  Data  Entered) 


SECURITY  CLASSIFICATION  OF  THIS  PAGEfWh«»n  Dutu  Enter'd) 


SECURITY  CLASSIFICATION  OF  THIS  PAGEfW>*n  D*f  Enter'd) 


J 


t 


